youqingxiaozhua / APViT

PaddlePaddle and PyTorch implementation of APViT and TransFER
Apache License 2.0
38 stars 9 forks source link

Missing parameters and wrong predictions #5

Closed gblanco10 closed 1 year ago

gblanco10 commented 1 year ago

Hi, I have downloaded model pretrained weights (file APViT_RAF-3eeecf7d.pth) following the link in README and tried to run the model architecture on some sample images.

I am loading the model with this code snippet

cfg = mmcv.Config.fromfile("configs/apvit/RAF.py")
cfg.model.pretrained = None

# build the model and load checkpoint
classifier = build_classifier(cfg.model)
load_checkpoint(classifier, "pretrained/APViT_RAF-3eeecf7d.pth", map_location='cpu')
classifier = classifier.to("cuda")
classifier.eval()

but I get some warnings

unexpected key in source state_dict: 
output_layer.0.weight, output_layer.0.bias, output_layer.0.running_mean, output_layer.0.running_var, output_layer.0.num_batches_tracked, output_layer.3.weight, output_layer.3.bias, output_layer.4.weight, output_layer.4.bias, output_layer.4.running_mean, output_layer.4.running_var, output_layer.4.num_batches_tracked, body.21.shortcut_layer.0.weight, body.21.shortcut_layer.1.weight, body.21.shortcut_layer.1.bias, body.21.shortcut_layer.1.running_mean, body.21.shortcut_layer.1.running_var, body.21.shortcut_layer.1.num_batches_tracked, body.21.res_layer.0.weight, body.21.res_layer.0.bias, body.21.res_layer.0.running_mean, body.21.res_layer.0.running_var, body.21.res_layer.0.num_batches_tracked, body.21.res_layer.1.weight, body.21.res_layer.2.weight, body.21.res_layer.3.weight, body.21.res_layer.4.weight, body.21.res_layer.4.bias, body.21.res_layer.4.running_mean, body.21.res_layer.4.running_var, body.21.res_layer.4.num_batches_tracked, body.22.res_layer.0.weight, body.22.res_layer.0.bias, body.22.res_layer.0.running_mean, body.22.res_layer.0.running_var, body.22.res_layer.0.num_batches_tracked, body.22.res_layer.1.weight, body.22.res_layer.2.weight, body.22.res_layer.3.weight, body.22.res_layer.4.weight, body.22.res_layer.4.bias, body.22.res_layer.4.running_mean, body.22.res_layer.4.running_var, body.22.res_layer.4.num_batches_tracked, body.23.res_layer.0.weight, body.23.res_layer.0.bias, body.23.res_layer.0.running_mean, body.23.res_layer.0.running_var, body.23.res_layer.0.num_batches_tracked, body.23.res_layer.1.weight, body.23.res_layer.2.weight, body.23.res_layer.3.weight, body.23.res_layer.4.weight, body.23.res_layer.4.bias, body.23.res_layer.4.running_mean, body.23.res_layer.4.running_var, body.23.res_layer.4.num_batches_tracked

missing keys in source state_dict: projs.0.weight, projs.0.bias

Then I am loading some images in which I am first using MTCNN to crop around person face (to make them more similar to RAF DB) and processing with this torch transformations that should replicate the ones in the config files

test_preprocess = transforms.Compose([
                           transforms.Resize((112, 112)),
                           transforms.ToTensor(),
                           transforms.Normalize(
                                      mean=[x/255 for x in [123.675, 116.28, 103.53] ],
                                      std=[x for x in [58.395, 57.12, 57.375] ]
                           )
])

and running inference with

out = classifier(tensor_in.to("cuda"), return_loss=False)
out = [np.argmax(o) for o in out]

but what I get is always class 6 no matter the expression person has in input image.

Am I doing something wrong in either model loading or preprocessing ?

Thanks for your support

Current Environment:

youqingxiaozhua commented 1 year ago

Thanks for your interest in our work and really detailed description. The first warning is raised when loading pre-trained weights of IR-50 and ViT-Small. It is useless for inference and will not affect the result. You can disable the pretraining with:

cfg.model.extractor.pretrained = None
cfg.model.vit.pretrained = None

The wrong prediction is caused by the wrong data processing (e.g., the std in Normalize is not divided by 255). But I find that it still has some differences between torchvision.transformes and mmcls. So it is highly recommended to directly use transforms in mmcls.

I have updated a demo notebook at demo.ipynb. Hope this could help you.

gblanco10 commented 1 year ago

Hi @youqingxiaozhua thanks for your prompt reply!

Nice to know weight loading is done correctly, thanks for the exaplanation.

Regarding data preprocessing, I have taken values of mean and stds reported in mmcls dataset config and divided by 255 because mmcls perform normalization when image pixels values are still in [0,255] while with torchvision this operation can be applied only after ToTensor which rescale image pixels value in [0,1]. Does this sound correct to you ? Which other differences are you noticing between the two preprocessing pipelines ? I would like to use torch for preprocessing, is this feasibile in your opinion ?

By the way, thanks for updating a demo notebook.

youqingxiaozhua commented 1 year ago

Hi gblanco10. Yes, you need to divide both the mean and std values of mmcls by 255. However, in your original code, the std value is not divided (only the mean value is divided). After correcting this error, your torchvision version processing would perform the same as the mmcls version, and the model will predict as expected.

gblanco10 commented 1 year ago

Hi @youqingxiaozhua I am sorry for the silly error, now it is corrected and it performs way better ! Feel free to either close or directly delete this issue.

Thanks again