karpathy / neuraltalk2

Efficient Image Captioning code in Torch, runs on GPU
5.5k stars 1.26k forks source link

Why the eval CIDEr: 0.673 is less than 0.9 #141

Open gujiuxiang opened 8 years ago

gujiuxiang commented 8 years ago

Hi, I evaluate the pre-trained model , but the scores are as following:

ratio: 0.996517511006 Bleu_1: 0.639 Bleu_2: 0.455 Bleu_3: 0.317 Bleu_4: 0.222 computing METEOR score... METEOR: 0.202 computing Rouge score... ROUGE_L: 0.467 computing CIDEr score... CIDEr: 0.673

How can I achieve the ~0.9 performance mentioned in the readme.md

AdityaChaganti commented 8 years ago

Did you activate finetuning?

gujiuxiang commented 8 years ago

I did not fine-tuning the model, I just download the pre-trained model, and run the evaluation code (coco-caption). Maybe I should fine-tune the model, another question, if I want to achieve the 0.9 score, how many iteration ? suppose the batch size as default as they are.

On 25 Aug 2016, at 11:38 PM, Aditya Chaganti notifications@github.com wrote:

Did you activate finetuning?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/karpathy/neuraltalk2/issues/141#issuecomment-242433027, or mute the thread https://github.com/notifications/unsubscribe-auth/AJoEtVyevVdDKJiPZt0ntr4eE1Dz5G0nks5qjbbagaJpZM4JssCq.

AdityaChaganti commented 8 years ago

Fine-tuning helped me get to a CIDEr of around 0.9, but you can do that only if you train. I don't remember the number of iterations, but I trained for 4 days rather than two. This is because I ran into memory issues when I trained with a batch size 16 with fine-tuning enabled. I reduced the batch size to 4, but also reduced the learning rate proportionally, which is why I had to train longer.

I'm not entirely sure why the pre-trained model is giving you a low CIDEr if you're testing on the test split. Should be higher, I think.

Afeihan commented 7 years ago

Excuse me. I have a problem. I saw that you trained for 4 days, but I trained for about half an month, and the running don't end up. The iteration number now is about 5millions. What's wrong? I need your help. Thanks a lot ! Ps: I have only one GPU @AdityaChaganti

superwj1990 commented 7 years ago

@AdityaChaganti Excuse me. What's the learning rate do you set? When I fine-tune the CNN, my loss climb up to about 5.3 from 2.7.

YanShuo1992 commented 7 years ago

@gujiuxiang Hi, could you please introduce more about how to evaluate the pertained model please? I just can not find way to do it.

gujiuxiang commented 7 years ago

Well, you can run the eval.lua in source code directory. 1st, download the coco-caption 2nd, set the h5 and json file in the corresponding domains in eval.lua

I thinks it is very easy to do this, good luck

YanShuo1992 commented 7 years ago

@gujiuxiang Thank you for replying. So I just follow the instruction to run the prepro.py to get the h5 and jason, right?

gujiuxiang commented 7 years ago

sure

On Wed, Jun 14, 2017 at 3:53 PM, YanShuo1992 notifications@github.com wrote:

@gujiuxiang https://github.com/gujiuxiang Thank you for replying. So I just follow the instruction to run the prepro.py to get the h5 and jason, right?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/karpathy/neuraltalk2/issues/141#issuecomment-308350728, or mute the thread https://github.com/notifications/unsubscribe-auth/AJoEtROnobk0XPq2Z0ycQwlhxn3Kl9F7ks5sD5GMgaJpZM4JssCq .

YanShuo1992 commented 7 years ago

@gujiuxiang I still have some questions about the details. For example, I use Val images and its annotations to generate the coco_raw.json. Then what command should I use to invoke the prepro.py?

$ python prepro.py --input_json coco/coco_raw.json --num_val 5000 --num_test 5000 --images_root coco/images --word_count_threshold 5 --output_json coco/cocotalk.json --output_h5 coco/cocotalk.h5

The default is like that. How should I change the num_val and num_test ? The num_test is 0?
Another question is how should I set them in the eval.lua? Set the input_h5, input_json with path and language_eval as 1?

Sorry to ask so many questions. It just no enough time to research the project in detail at the end in the semester.

gujiuxiang commented 7 years ago

Hi, validation image and labels have been included in H5 and json file. The only thing you need to do is set the split in Dataloader as 'val'.

On Wed, Jun 14, 2017 at 4:12 PM, YanShuo1992 notifications@github.com wrote:

@gujiuxiang https://github.com/gujiuxiang I still have some questions about the details. For example, I use Val images and its annotations to generate the coco_raw.json. Then what command should I use to invoke the prepro.py?

$ python prepro.py --input_json coco/coco_raw.json --num_val 5000 --num_test 5000 --images_root coco/images --word_count_threshold 5 --output_json coco/cocotalk.json --output_h5 coco/cocotalk.h5

The default is like that. How should I change the num_val and num_test ? The num_test is 0? Another question is how should I set them in the eval.lua? Set the input_h5, input_json with path and language_eval as 1?

Sorry to ask so many questions. It just no enough time to research the project in detail at the end in the semester.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/karpathy/neuraltalk2/issues/141#issuecomment-308355815, or mute the thread https://github.com/notifications/unsubscribe-auth/AJoEtWn0V_P66fNx8a0vV0R8rcMbs0oCks5sD5XkgaJpZM4JssCq .

YanShuo1992 commented 7 years ago

@gujiuxiang Thanks a lot. Although I still don't know how to set the split in Dataloader, I get the scores. I set the input_h5 parameter for eval.lua. What is the num-val do you use for generating h5 file? I just try 10 and the CIDEr is only 0.16. Maybe the score is relative to the number.

YanShuo1992 commented 7 years ago

@gujiuxiang The beam search size also influent the score.

hpts23 commented 6 years ago

Hi. I have a question. I want to evaluate the pretrained model on BLEU score. However, I think we don't know which images karpathy used for training. So, I think we cannot choose the test data which this model didn't use, so we cannot evaluate his pretrained model? Is it correct?

mymuli commented 5 years ago

When I finished training , the CIDEr score was only about 0.67. ..How can I fine-tune it to reach above 0.9? Can you give me some advice? Thank you very much.