mlpc-ucsd / CoaT

(ICCV 2021 Oral) CoaT: Co-Scale Conv-Attentional Image Transformers
Apache License 2.0
227 stars 30 forks source link

COAT for multilabel classification #8

Closed abhigoku10 closed 3 years ago

abhigoku10 commented 3 years ago

@yix081 @xwjabc thanks for sharing the code base , i have few queries on the problem statement which i am working for its gender_age classification of a person ie multilabel recognition problem

  1. my input image size varies from 8056 to 256128 for this input image should i change the patch size from 4 to 16 if so what all other params should i change ?
  2. since it is multilabel classification problem should i change the self.head()= nn.Linear(self.num_features, num_classes) if num_classes > 0 else nn.Identity() line
  3. should i freeze the layers in the transformer and train only the last layer ?? Thanks in advance
xwjabc commented 3 years ago
  1. For image size from 80x56 to 256x128, I think you do not need to change the patch size (original input patch size is 4 for 224x224 image).
  2. I think you can still keep the current self.head. However, make sure you use proper loss for multi-label classification.
  3. I think that is also a choice. You can load a pretrained CoaT model and attempt to finetune (1) only last layer (2) all layers. Hope my answers help!