Closed gblanco10 closed 1 year ago
Thanks for your interest in our work and really detailed description. The first warning is raised when loading pre-trained weights of IR-50 and ViT-Small. It is useless for inference and will not affect the result. You can disable the pretraining with:
cfg.model.extractor.pretrained = None
cfg.model.vit.pretrained = None
The wrong prediction is caused by the wrong data processing (e.g., the std in Normalize
is not divided by 255). But I find that it still has some differences between torchvision.transformes
and mmcls
. So it is highly recommended to directly use transforms in mmcls
.
I have updated a demo notebook at demo.ipynb. Hope this could help you.
Hi @youqingxiaozhua thanks for your prompt reply!
Nice to know weight loading is done correctly, thanks for the exaplanation.
Regarding data preprocessing, I have taken values of mean and stds reported in mmcls dataset config and divided by 255 because mmcls perform normalization when image pixels values are still in [0,255] while with torchvision this operation can be applied only after ToTensor which rescale image pixels value in [0,1]. Does this sound correct to you ? Which other differences are you noticing between the two preprocessing pipelines ? I would like to use torch for preprocessing, is this feasibile in your opinion ?
By the way, thanks for updating a demo notebook.
Hi gblanco10. Yes, you need to divide both the mean and std values of mmcls by 255. However, in your original code, the std value is not divided (only the mean value is divided). After correcting this error, your torchvision version processing would perform the same as the mmcls version, and the model will predict as expected.
Hi @youqingxiaozhua I am sorry for the silly error, now it is corrected and it performs way better ! Feel free to either close or directly delete this issue.
Thanks again
Hi, I have downloaded model pretrained weights (file APViT_RAF-3eeecf7d.pth) following the link in README and tried to run the model architecture on some sample images.
I am loading the model with this code snippet
but I get some warnings
Then I am loading some images in which I am first using MTCNN to crop around person face (to make them more similar to RAF DB) and processing with this torch transformations that should replicate the ones in the config files
and running inference with
but what I get is always class 6 no matter the expression person has in input image.
Am I doing something wrong in either model loading or preprocessing ?
Thanks for your support
Current Environment: