got bad result when predicting

microsoft / Oscar

Oscar and VinVL

MIT License

1.04k stars 251 forks source link

HI, the predicted results using checkpoint-29-66420 are wrong, like: 391895 [{"caption": "nearly marcia drippedtangletangleyo pat hypothetical hyper pat hypothetical hypothetical hyderabadtangleyo parry tumbledyo", "conf": 0.0002988350752275437}] 60623 [{"caption": "marcia marcia\u767a\u767a\u767a\u767a\u767a\u767a\u767a\u767a\u767a\u767a haitian haitian haitian corruptionyo southend", "conf": 0.0002888015587814152}] 483108 [{"caption": "traps factual mosque \u0f56 \u0f56 \u0f56 \u0f56yo pat hypotheticalyo trapsyo traps hyper \u091ayo southend", "conf": 0.00027631374541670084}] 384213 [{"caption": "corruption marian marian \u0f56 \u0f56 \u0f56 \u0f56 marian marian lionsyo parryyo parry hyper wilburyo\u539f", "conf": 0.00028112903237342834}] 386164 [{"caption": "mechanisms mechanisms mechanisms wilkes mechanisms declares declares declares declares declares declares declares declaresfrey horriblyyo pat\u30af", "conf": 0.00029355354490689933}] 223648 [{"caption": "blond marcia marcia challenged marcia marcia marcia marcia marcia marcia marcia marciachus marcia vishnuyo pat\u30af", "conf": 0.0003286841092631221}] 294832 [{"caption": "marcia marcia pietro ri parry pat noiseco faust develops \u0e1e \u0e1e \u0e1e melee melee melee marian daggers", "conf": 0.0002665773790795356}] 462565 [{"caption": "faust faust\u767a\u767a lax lax accountant hardly dripped lax traps \u2013 lax hyper hardly travelyo pat", "conf": 0.00027798896189779043}] 436141 [{"caption": "rendezvous marcia pietro pietro ri parry pat\u30af ara pat hypotheticalfreyyo parry detail daggers epilogue epilogue", "conf": 0.00027373709599487484}] 192440 [{"caption": "molded [unused435]yo pat appellate upgrading upgrading marian upgrading upgradingfreyfrey marianfrey marian marian marian marian", "conf": 0.00028483488131314516}]

@KrystalCWT I am also getting random captions, as yours, when I run the checkpoint, checkpoint-29-66420, on coco captions Karpathy test split images. However, when I fine-tune the checkpoint using following command, I get good captions and CIDER score is also decent. These predictions are good even when I use the fine-tuned model after 5000 steps. Your fine-tuning and captions prediction commands would be different from mine, but below are my commands just for reference:

Finetuning:

python oscar/run_captioning.py --model_name_or_path /home/gsrivastava/ephemeral_drive/work/image_captioning/Oscar/model_dir/checkpoint-29-66420 --do_train --do_lower_case --evaluate_during_training --add_od_labels --learning_rate 0.00003 --per_gpu_train_batch_size 64 --num_train_epochs 30 --save_steps 5000 --output_dir /home/gsrivastava/ephemeral_drive/work/image_captioning/Oscar/trained_checkpoints/output_coco_authorfeats_ck_29_66420try --train_yaml /home/gsrivastava/ephemeral_drive/work/image_captioning/Oscar/datasets/coco_caption/train_abspath.yaml --data_dir /home/gsrivastava/ephemeral_drive/work/image_captioning/Oscar/datasets/coco_caption --val_yaml /home/gsrivastava/ephemeral_drive/work/image_captioning/Oscar/datasets/coco_caption/val.yaml

prediction on coco captions Karpathy test split:

python oscar/run_captioning.py --do_test --do_eval --test_yaml /home/gsrivastava/ephemeral_drive/work/image_captioning/Oscar/datasets/coco_caption/test.yaml --per_gpu_eval_batch_size 64 --num_beams 5 --max_gen_length 20 --eval_model_dir /home/gsrivastava/ephemeral_drive/work/image_captioning/Oscar/trained_checkpoints/output_coco_authorfeats_ck_29_66420try/checkpoint-0-5000/

microsoft / Oscar

got bad result when predicting #140