zengqunzhao / EfficientFace

[AAAI'21] Robust Lightweight Facial Expression Recognition Network with Label Distribution Training
MIT License
184 stars 31 forks source link

missing the last fc layer in the pre-trained model #5

Closed sukun1045 closed 3 years ago

sukun1045 commented 3 years ago

Hi, thanks for the nice and clean repo. I am trying to use your code on some in the wild images. I realize the provided pre-trained model_cla doesn't have the weight of the final FC layer (1024 x 7). Could you help me with that? Thanks again

zengqunzhao commented 3 years ago

Hi, thanks for the nice and clean repo. I am trying to use your code on some in the wild images. I realize the provided pre-trained model_cla doesn't have the weight of the final FC layer (1024 x 7). Could you help me with that? Thanks again

Hi, thank you for your attention to our work. Could you please provide more details?

sukun1045 commented 3 years ago

So I download the models and then I use the Pretrained_EfficientFace.tar and print out the weights as follow: model_cla = EfficientFace.efficient_face() model_cla = torch.nn.DataParallel(model_cla).cuda() checkpoint = torch.load('checkpoint/Pretrained_EfficientFace.tar') pre_trained_dict = checkpoint['state_dict'] for k, v in pre_trained_dict.items(): print(k, v.shape)

The last several lines of the output are the following: module.conv5.0.weight torch.Size([1024, 464, 1, 1]) module.conv5.1.weight torch.Size([1024]) module.conv5.1.bias torch.Size([1024]) module.conv5.1.running_mean torch.Size([1024]) module.conv5.1.running_var torch.Size([1024]) module.conv5.1.num_batches_tracked torch.Size([]) module.fc.weight torch.Size([12666, 1024]) module.fc.bias torch.Size([12666])

It seems like it doesn't have the classification layer and that is all I want since I simply want to use the pre-trained model as a black box and output the facial expression.

zengqunzhao commented 3 years ago

So I download the models and then I use the Pretrained_EfficientFace.tar and print out the weights as follow: model_cla = EfficientFace.efficient_face() model_cla = torch.nn.DataParallel(model_cla).cuda() checkpoint = torch.load('checkpoint/Pretrained_EfficientFace.tar') pre_trained_dict = checkpoint['state_dict'] for k, v in pre_trained_dict.items(): print(k, v.shape)

The last several lines of the output are the following: module.conv5.0.weight torch.Size([1024, 464, 1, 1]) module.conv5.1.weight torch.Size([1024]) module.conv5.1.bias torch.Size([1024]) module.conv5.1.running_mean torch.Size([1024]) module.conv5.1.running_var torch.Size([1024]) module.conv5.1.num_batches_tracked torch.Size([]) module.fc.weight torch.Size([12666, 1024]) module.fc.bias torch.Size([12666])

It seems like it doesn't have the classification layer and that is all I want since I simply want to use the pre-trained model as a black box and output the facial expression.

got it. This error has been fixed two months ago, I think you should update your codes with the current version. :)

sukun1045 commented 3 years ago

Hi, I am using the latest code. I think the code is correct since I find that in efficient_face(), the final layer is defined self.fc = nn.Linear(output_channels, num_classes). However, the pre-trained model (Pretrained_EfficientFace.tar) in the model.zip file is not updated.

sukun1045 commented 3 years ago

Yes. I adapt your code a little bit since, in your code, you add model_cla.module.fc = nn.Linear(1024, 7).cuda() after you load your weights. So the last linear layer (1024 , 7) is actually random.

zengqunzhao commented 3 years ago

Hi, I am using the latest code. I think the code is correct since I find that in efficient_face(), the final layer is defined self.fc = nn.Linear(output_channels, num_classes). However, the pre-trained model (Pretrained_EfficientFace.tar) in the model.zip file is not updated.

The FC layer in Pretrained Model is module.fc.weight torch.Size([12666, 1024]) The FC layer in efficient_face() is nn.Linear(output_channels, num_classes) So the 'model_cla.fc = nn.Linear(1024, 12666)' is used to fit the pretrained model.

And I have tested the code, it is correct.

zengqunzhao commented 3 years ago

Yes. I adapt your code a little bit since, in your code, you add model_cla.module.fc = nn.Linear(1024, 7).cuda() after you load your weights. So the last linear layer (1024 , 7) is actually random.

Yeah, the linear layer is random. The model is pre-trained on a large-scale Face-Recognition date set, so the class number is different from the FER dataset.

sukun1045 commented 3 years ago

right. I understand you pre-trained on a face recognition dataset and you add the expression classification layer afterward. I am asking whether you have the weights of the final expression recognition model where the last linear layer is trained.

zengqunzhao commented 3 years ago

right. I understand you pre-trained on a face recognition dataset and you add the expression classification layer afterward. I am asking whether you have the weights of the final expression recognition model where the last linear layer is trained.

Yes, I have the pre-trained model for testing, but currently, I am no in the lab, maybe I can provide it when I come back to school.

zengqunzhao commented 3 years ago

right. I understand you pre-trained on a face recognition dataset and you add the expression classification layer afterward. I am asking whether you have the weights of the final expression recognition model where the last linear layer is trained.

right. I understand you pre-trained on a face recognition dataset and you add the expression classification layer afterward. I am asking whether you have the weights of the final expression recognition model where the last linear layer is trained.

Yes, I have the pre-trained model for testing, but currently, I am no in the lab, maybe I can provide it when I come back to school.

I am an intern at Alibaba now.

sukun1045 commented 3 years ago

Sure. Thanks for your reply!

harisushehu commented 3 years ago

Hi @sukun1045 @zengqunzhao,

I tried running this code on the RAF dataset and I got a similar accuracy. However, when I try to run it on a different in the wild dataset, the accuracy does not go beyond 30%.

Could you please advise on how I may go about this? Thanks

zengqunzhao commented 3 years ago

Hi @sukun1045 @zengqunzhao,

I tried running this code on the RAF dataset and I got a similar accuracy. However, when I try to run it on a different in the wild dataset, the accuracy does not go beyond 30%.

Could you please advise on how I may go about this? Thanks

Hi @harisushehu,

Thanks for your attention to our work. Firstly, there are two steps before training. First, the EfficientFace and LDG are both pre-trained on MS-Celeb-1M. Then, the LDG needs to be pre-trained on corresponding FER datasets. After finishing these two steps, you can train the EfficientFace on corresponding FER datasets. Currently, only the EfficientFace pre-trained on MS-Celeb-1M and LDG pre-trained on RAF-DB are provided. When I come back to Lab, I will release the other pre-trained models.

Best, Zengqun

harisushehu commented 3 years ago

Hi @zengqunzhao, thanks for your prompt reply.

I read the paper and it seems like you proposed a method to generate the LDG different from previous works. Would it be possible to also provide the code that can be used to train a corresponding in the wild FER dataset as I want to try it on the CK+ and the KDEF datasets rather than the datasets you have used in your paper? Thanks in advance.

Also, when do you expect to be back to the lab?

zengqunzhao commented 3 years ago

Hi @zengqunzhao, thanks for your prompt reply.

I read the paper and it seems like you proposed a method to generate the LDG different from previous works. Would it be possible to also provide the code that can be used to train a corresponding in the wild FER dataset as I want to try it on the CK+ and the KDEF datasets rather than the datasets you have used in your paper? Thanks in advance.

Also, when do you expect to be back to the lab?

Sorry, I did not understand your meaning clearly. I have an internship at Alibaba currently. My school is closed due to the COVID-19, I have no idea about it.

harisushehu commented 3 years ago

So what I am saying is, can you provide the code to train the EfficientFace and LDG so as to obtain the pre-trained models?

zengqunzhao commented 3 years ago

So what I am saying is, can you provide the code to train the EfficientFace and LDG so as to obtain the pre-trained models?

Got it. I don't have the pre-training code on my laptop. The model of the EfficientFace and LDG has been released, so I think it's easy to write the code for pre-training them.

zengqunzhao commented 3 years ago

:)

katerynaCh commented 2 years ago

Hi @sukun1045 @zengqunzhao, I tried running this code on the RAF dataset and I got a similar accuracy. However, when I try to run it on a different in the wild dataset, the accuracy does not go beyond 30%. Could you please advise on how I may go about this? Thanks

Hi @harisushehu,

Thanks for your attention to our work. Firstly, there are two steps before training. First, the EfficientFace and LDG are both pre-trained on MS-Celeb-1M. Then, the LDG needs to be pre-trained on corresponding FER datasets. After finishing these two steps, you can train the EfficientFace on corresponding FER datasets. Currently, only the EfficientFace pre-trained on MS-Celeb-1M and LDG pre-trained on RAF-DB are provided. When I come back to Lab, I will release the other pre-trained models.

Best, Zengqun

Hi! I see that you recently followed up on this and shared several LDG models trained for FER. Do you have a plan to also share the trained EfficientFace models trained on any FER datasets?