rmokady / CLIP_prefix_caption

Simple image captioning model
MIT License
1.26k stars 212 forks source link

did anyone reproduce the transformer network with frozen GPT-2? #70

Open baiyuting opened 12 months ago

baiyuting commented 12 months ago

did anyone reproduce the transformer network with frozen GPT-2?

I enter command python train.py --only_prefix --data ./data/coco/oscar_split_ViT-B_32_train.pkl --out_dir ./coco_train/ --mapping_type transformer --num_layers 8 --prefix_length 40 --prefix_length_clip 40

model is trained on mscoco dataset (train+val), the result on test split is

image

blue4 is 20.0 and cider is 66.3 , I got the best result in third epoch, but this result is less than what the paper gives, blue4 is 33.53 and cider is 113.08

image

I am confused about the result, did anyone reproduce the result? Did I miss something?

rongtongxueya commented 12 months ago

I want to know how the results of this evaluation are displayed, how can I not run train.py

baiyuting commented 11 months ago

I use https://github.com/salaniz/pycocoevalcap to evaluate result, I rewrite the captions_val2014_fakecap_results.json in folder "example" and enter command "python coco_eval_example.py"

cjc20000323 commented 9 months ago

I want to know have you reproduced the results of transformer reported in the paper

baiyuting commented 9 months ago

no, I have not reproduced this result

cjc20000323 commented 9 months ago

您好,我已收到您的邮件,请您知悉。

cjc20000323 commented 2 months ago

您好,我已收到您的邮件,请您知悉。

qvqqa commented 2 months ago

I also trained the only-transformer model and evaluate as you say, and the result is similar to yours. It's not as good as the result in this paper. Have you solved it?