microsoft / i-Code

MIT License
1.67k stars 161 forks source link

Finetuning on InfographicVQA #125

Open Caixin89 opened 9 months ago

Caixin89 commented 9 months ago

I was unable to achieve the result shown in the UDOP paper.

I used the udop-unimodel-large-224 checkpoint.

My ANLS score is 0.407903. This is nowhere near 0.461 as shown in the table below taken from the paper.

image

Since I noticed that the batch size, warmup steps and weight decay given in https://github.com/microsoft/i-Code/blob/main/i-Code-Doc/scripts/finetune_duebenchmark.sh is different from reported in the paper, I also tried changing the finetuning script to use the paper's settings.

drawing

Lastly, I also tried adding the task prompt prefix since it is not done so in the existing code. I followed approach from https://github.com/microsoft/i-Code/issues/71#issuecomment-1623201208

Results of the 3 different finetuning configurations: Task prefix Hyperparameter settings ANLS Score
No Unchanged finetuning script 0.407903
No Paper's settings 0.40174
Yes Unchanged finetuning script 0.408355

Other changes I made:

Please assist

Caixin89 commented 9 months ago

May I know if the results shown in table 8 above is validation set or test set scores?

zinengtang commented 9 months ago

table 8 shows validation results. may I know how many epochs have you run the model and what checkpoint did you use?

Caixin89 commented 9 months ago

4 epochs for the 2 runs that use unchanged finetuning script 5 epochs when I changed the finetuning script to paper's settings

The last epochs are automatically decided based on early_stopping_patience=20.

zinengtang commented 9 months ago

I am assuming you are using the last checkpoint the run generated instead of intermediate checkpoint? If so, try using more epochs. If it still doesn't work, I will provide finetuned checkpoint to see if the issue is on the evaluation script?

Caixin89 commented 9 months ago

Sure, I can try that. In the mean while, could you share what is the number of epochs you have used for finetuning?

Caixin89 commented 9 months ago

image The above is a plot of the validation loss against training steps. The validaton loss is increasing consistently acorss training steps.

Is this expected?

Pietro1999IT commented 8 months ago

I was unable to achieve the result shown in the UDOP paper.

I used the udop-unimodel-large-224 checkpoint.

My ANLS score is 0.407903. This is nowhere near 0.461 as shown in the table below taken from the paper.

image

Since I noticed that the batch size, warmup steps and weight decay given in https://github.com/microsoft/i-Code/blob/main/i-Code-Doc/scripts/finetune_duebenchmark.sh is different from reported in the paper, I also tried changing the finetuning script to use the paper's settings.

drawing

Lastly, I also tried adding the task prompt prefix since it is not done so in the existing code. I followed approach from #71 (comment)

Results of the 3 different finetuning configurations:

Task prefix Hyperparameter settings ANLS Score No Unchanged finetuning script 0.407903 No Paper's settings 0.40174 Yes Unchanged finetuning script 0.408355 Other changes I made:

Please assist

May i ask how you have implemented ANLS metric for the task?

yuanzheng625 commented 8 months ago

I was unable to achieve the result shown in the UDOP paper. I used the udop-unimodel-large-224 checkpoint. My ANLS score is 0.407903. This is nowhere near 0.461 as shown in the table below taken from the paper. image Since I noticed that the batch size, warmup steps and weight decay given in https://github.com/microsoft/i-Code/blob/main/i-Code-Doc/scripts/finetune_duebenchmark.sh is different from reported in the paper, I also tried changing the finetuning script to use the paper's settings.

drawing

Lastly, I also tried adding the task prompt prefix since it is not done so in the existing code. I followed approach from #71 (comment) Results of the 3 different finetuning configurations: Task prefix Hyperparameter settings ANLS Score No Unchanged finetuning script 0.407903 No Paper's settings 0.40174 Yes Unchanged finetuning script 0.408355 Other changes I made:

Please assist

May i ask how you have implemented ANLS metric for the task?

should be in this repo https://github.com/due-benchmark/evaluator/tree/master

Caixin89 commented 8 months ago

Yes, I used ANLS from https://github.com/due-benchmark/evaluator/tree/master.

Caixin89 commented 8 months ago

I am assuming you are using the last checkpoint the run generated instead of intermediate checkpoint? If so, try using more epochs. If it still doesn't work, I will provide finetuned checkpoint to see if the issue is on the evaluation script?

I have tried with 10 epochs and my ANLS is still ~0.41. Am I supposed to finetune with even more epochs?

Could you provide me with your finetuned checkpoint?

Caixin89 commented 8 months ago

Also I would like to double check that the 46.1 ANLS score is indeed based on fine-tuning of udop-unimodel-large-224 checkpoint without additional supervised pre-training.

Correct?

Caixin89 commented 7 months ago

I am assuming you are using the last checkpoint the run generated instead of intermediate checkpoint? If so, try using more epochs. If it still doesn't work, I will provide finetuned checkpoint to see if the issue is on the evaluation script?

I have tried with 10 epochs and my ANLS is still ~0.41. Am I supposed to finetune with even more epochs?

Could you provide me with your finetuned checkpoint?

@zinengtang Any updates?