microsoft / Oscar

Oscar and VinVL
MIT License
1.04k stars 251 forks source link

Getting extremely low scores for nocaps evaluation #160

Open rmokady opened 2 years ago

rmokady commented 2 years ago

Hi, For some reason the nocaps evaluation is not working for me. I'm getting extremely low scores. My command line is:

CUDA_VISIBLE_DEVICES=1 python oscar/run_captioning.py --do_test --do_eval --data_dir nocaps --test_yaml val.yaml --per_gpu_eval_batch_size 4 --num_beams 5 --max_gen_length 20 --eval_model_dir ./nocaps_base_xe/

And my results are: defaultdict(<class 'dict'>, {'B1': {'in-domain': 0.2, 'near-domain': 0.2, 'out-domain': 0.28, 'entire': 0.22}, 'B2': {'in-domain': 0.0, 'near-domain': 0.0, 'out-domain': 0.0, 'entire': 0.0}, 'B3': {'in-domain': 0.0, 'near-domain': 0.0, 'out-domain': 0.0, 'entire': 0.0}, 'B4': {'in-domain': 0.0, 'near-domain': 0.0, 'out-domain': 0.0, 'entire': 0.0}, 'METEOR': {'in-domain': 0.89, 'near-domain': 0.86, 'out-domain': 0.95, 'entire': 0.88}, 'ROUGE-L': {'in-domain': 0.25, 'near-domain': 0.25, 'out-domain': 0.34, 'entire': 0.27}, 'CIDEr': {'in-domain': 0.01, 'near-domain': 0.01, 'out-domain': 0.01, 'entire': 0.01}, 'SPICE': {'in-domain': 3.47, 'near-domain': 3.5, 'out-domain': 3.42, 'entire': 3.48}})

CC: @amirhertz

Would appreciate any help

BigHyf commented 2 years ago

@rmokady hi, can i ask you how to generate test_yaml , i follow issue to do but did not work

rmokady commented 2 years ago

@BigHyf You can download val.yaml and other data files, check the vinyl download page

BigHyf commented 2 years ago

@rmokady hi, i know train.yaml should be like this: img: train.img.tsv hw: train.hw.tsv label: train.label.tsv feature: train.feature.tsv do you need to generate this tsv files? or you just use files that author already provided like coco dataset.

rmokady commented 2 years ago

@BigHyf I don't find it relevant to the issue I opened, please open a new one