VIT\mim_pytorch.py", line 215, in forward_features prompt_tokens = torch.index_select(self.prompts, 1, y.int()) AttributeError: 'int' object has no attribute 'int'

Hassan-miqdad commented 7 months ago

zyainfal commented 7 months ago

here y is the indicator of the data if the face has been masked, which should be a tensor in the shape of (batchsize,)

an example of y is tensor([0,0,1,1,0,1,...,1])

so the possible issue is that the model takes an int value instead of a tensor.

Hassan-miqdad commented 7 months ago

I am trying to do inference, not training. How to correctly do that?

In case of ResNet100, it works fine. However, the issue happened when try to inference with VIT

zyainfal commented 7 months ago

It works on resnet because resent does not use y. For ViT, if your data does not include any masked faces, you can initialize y as torch.zeros(x.shape(0), device=x.device) which means all inputs are holistic faces.

as a quick fix, you can change the code L229-230 in mim_pytorch.py as following

    def forward(self, x, y=None):
        if y is None:
            y = torch.zeros(x.shape(0),  device=x.device)
        x = self.forward_features(x, y)

However, you need to modify y if the input x has masked faces (otherwise the feature could be inaccurate).

Hassan-miqdad commented 7 months ago

Thank you so much. It works 🙌🏻

Then with this quick fix, it can work if the input is either holistic face image or masked face image, right?

zyainfal commented 7 months ago

For masked faces, you need torch.ones instead of torch.zeros, as 0 means holistic faces and 1 means masked faces. When you gather either holistic or masked faces into a batch, then the quick fix works well if you use torch.ones and torch.zeros accordingly.

Hassan-miqdad commented 7 months ago

Is that correct?

def forward(self, x, y=None): if y is None: y = torch.zeros(x.shape[0], device=x.device) else: y = torch.ones(x.shape[0], device=x.device)

    x = self.forward_features(x, y)

zyainfal commented 7 months ago

Not exactly. The quick fix works only when you gather either holistic or masked faces in a batch, and y tell the model in which case it inference (say, 0 => all data are holistic faces, and 1=> all data are masked faces).

So, if you plan to gather either holistic or masked faces in a batch, you can change the code as

def forward(self, x, y=None):
    if y==0:
        y = torch.zeros(x.shape[0], device=x.device)
    elif y==1:
        y = torch.ones(x.shape[0], device=x.device)
    else:
        raise

Then you can feed the model with model(x,0) or model(x,1), where 0 means all data in x are holistic faces, and 1 means all data in x are masked faces. Please note you cannot feed the model as model(x) because the model has to know if the faces in this batch are masked or not.

If you want to mix the holistic and masked faces in a single batch, the ideal fix should be as follows: when you make a data batch, x are images and y are indicators, e.g. x = [img1, img2, ..., imgn] y = [ind1, ind2, ..., indn]

here if ind1=0, then img1 should be a masked face image, and if ind1=1 then img1 is a holistic face image.

zyainfal / Joint-Holistic-and-Masked-Face-Recognition

VIT\mim_pytorch.py", line 215, in forward_features prompt_tokens = torch.index_select(self.prompts, 1, y.int()) AttributeError: 'int' object has no attribute 'int' #3