yikuan8 / Transformers-VQA

An implementation that downstreams pre-trained V+L models to VQA tasks. Now support: VisualBERT, LXMERT, and UNITER
163 stars 28 forks source link

Why not use decoder? #7

Closed knight-fzq closed 3 years ago

knight-fzq commented 3 years ago

I found that weights of decoder are not loaded, so could you explain why and how to add them.

yikuan8 commented 3 years ago

What do you mean by decoder? Making a automatic CXR report system?

knight-fzq commented 3 years ago

Sorry to disturb you. Actually, these days i used your VQA visualbert model to do some experiment on hateful memes classification task published by Facebook but did not get similar results. And i found there was a notice that "you did not load decoder and .... weights" when loaded visualbert pretrained model. So what does this notice mean? Why not use decoder weights?

------------------ 原始邮件 ------------------ 发件人: @.>; 发送时间: 2021年6月7日(星期一) 凌晨4:31 收件人: @.>; 抄送: "Ziqing @.>; @.>; 主题: Re: [YIKUAN8/Transformers-VQA] Why not use decoder? (#7)

What do you mean by decoder? Making a automatic CXR report system?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

yikuan8 commented 3 years ago

I haven't worked on this project for more than one year. Please remind me where I mentioned "you did not load decoder and .... weights".

In addition, I think I load all the pre-trained weights of 12 layers of transformer blocks. I didn't load the weight for classification head as we were working for different tasks. I think a message will pop up when loading the weights, which will tell you what layers are actually loaded with weights.

knight-fzq commented 3 years ago

Sorry to disturb you, in file called openl_VQA.ipynb and command called "model.encoder.load(args.load_pretrained)". When i ran your model, there was an output like this:

------------------ 原始邮件 ------------------ 发件人: @.>; 发送时间: 2021年6月8日(星期二) 上午6:07 收件人: @.>; 抄送: "Ziqing @.>; @.>; 主题: Re: [YIKUAN8/Transformers-VQA] Why not use decoder? (#7)

I haven't worked on this project for more than one year. Please remind me where I mentioned "you did not load decoder and .... weights".

In addition, I think I load all the pre-trained weights of 12 layers of transformer blocks. I didn't load the weight for classification head as we were working for different tasks. I think a message will pop up when loading the weights, which will tell you what layers are actually loaded with weights.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

yikuan8 commented 3 years ago

Gotta you, those are the weights for the classification head. You can't load them as the classification goals are different. Tbh, I am not familiar with hateful meme classification. On a quick glance of the FB paper, I found that a message from appendix A: We fine-tune all models fully end-to-end without freezing anything regardless of whether model was pretrained or not. , which means their Faster RCNN layers are also fine-tuned instead of frozen in my implementation. You can make some modification to the image encoder part. Let me know if this helps.

akhanoN commented 3 years ago

I read your paper and in the paper you showed the outputs of image to report generation using V+L pre-trained models and when i see the code, the report generation part is missing in the code section why? If you did it please share that code, it will be very helpful in complete understanding of research work.

yikuan8 commented 3 years ago

I read your paper and in the paper you showed the outputs of image to report generation using V+L pre-trained models and when i see the code, the report generation part is missing in the code section why? If you did it please share that code, it will be very helpful in complete understanding of research work.

No, I didn't show the outputs of image to report generation. That is not my paper.

akhanoN commented 3 years ago

I am talking about the Research Paper Title '' A COMPARISON OF PRE-TRAINED VISION-AND-LANGUAGE MODELS FOR MULTIMODAL REPRESENTATION LEARNING ACROSS MEDICAL IMAGES AND REPORTS" Figure No. 3 results which are not present in your code, what are these results and how you get those results.

yikuan8 commented 3 years ago

Those notes were written by radiologists not generated by me. Fig 3 is for visualization of the attention heads.