Open Hassan-miqdad opened 7 months ago
here y is the indicator of the data if the face has been masked, which should be a tensor in the shape of (batchsize,)
an example of y is tensor([0,0,1,1,0,1,...,1])
so the possible issue is that the model takes an int value instead of a tensor.
I am trying to do inference, not training. How to correctly do that?
In case of ResNet100, it works fine. However, the issue happened when try to inference with VIT
It works on resnet because resent does not use y.
For ViT, if your data does not include any masked faces, you can initialize y as
torch.zeros(x.shape(0), device=x.device)
which means all inputs are holistic faces.
as a quick fix, you can change the code L229-230 in mim_pytorch.py
as following
def forward(self, x, y=None):
if y is None:
y = torch.zeros(x.shape(0), device=x.device)
x = self.forward_features(x, y)
However, you need to modify y if the input x has masked faces (otherwise the feature could be inaccurate).
Thank you so much. It works 🙌🏻
Then with this quick fix, it can work if the input is either holistic face image or masked face image, right?
For masked faces, you need torch.ones
instead of torch.zeros
, as 0 means holistic faces and 1 means masked faces.
When you gather either holistic or masked faces into a batch, then the quick fix works well if you use torch.ones
and torch.zeros
accordingly.
Is that correct?
def forward(self, x, y=None): if y is None: y = torch.zeros(x.shape[0], device=x.device) else: y = torch.ones(x.shape[0], device=x.device)
x = self.forward_features(x, y)
Not exactly. The quick fix works only when you gather either holistic or masked faces in a batch, and y tell the model in which case it inference (say, 0 => all data are holistic faces, and 1=> all data are masked faces).
So, if you plan to gather either holistic or masked faces in a batch, you can change the code as
def forward(self, x, y=None):
if y==0:
y = torch.zeros(x.shape[0], device=x.device)
elif y==1:
y = torch.ones(x.shape[0], device=x.device)
else:
raise
Then you can feed the model with model(x,0)
or model(x,1)
, where 0 means all data in x are holistic faces, and 1 means all data in x are masked faces. Please note you cannot feed the model as model(x)
because the model has to know if the faces in this batch are masked or not.
If you want to mix the holistic and masked faces in a single batch, the ideal fix should be as follows: when you make a data batch, x are images and y are indicators, e.g. x = [img1, img2, ..., imgn] y = [ind1, ind2, ..., indn]
here if ind1=0, then img1 should be a masked face image, and if ind1=1 then img1 is a holistic face image.
VIT\mim_pytorch.py", line 215, in forward_features prompt_tokens = torch.index_select(self.prompts, 1, y.int()) AttributeError: 'int' object has no attribute 'int'