maximek3 / e-ViL

40 stars 5 forks source link

VCR training does not converge #8

Open jakobamb opened 2 years ago

jakobamb commented 2 years ago

Hi!

First of all, thank you for the code. I was trying to train e-UG on the VCR dataset, using the VCR-pretrained base and what I think are the right hyperparameters. Unfortunately, the training does not really converge. I have attached the training log, perhaps you could point me in the right direction of what might have gone wrong?

Thank you very much and best regards

Jakob

Attachment: log.log

maximek3 commented 2 years ago

Hi Jakob, from what I can tell your losses are still going down, but very slowly. Perhaps it has something to do with your low max_sequence_length? If I remember correctly it had to be quite a bit longer for VCR, try 50 for example?

jakobamb commented 2 years ago

Hi Maxime, thanks for your answer!

I now finally got to try it. The task score is now at 0.6919, so slightly lower than the reported 0.698 but definitely comparable. I noticed that my results on the other datasets were also just slightly worse than what was reported in your paper. Do you have any idea why that could be? Did you use the same seed that is provided in the parameter file? The results that you report are from a single training/evaluation run, right?

Also, I was confused about this max_sequence_length parameter. I thought it is what you reported in the supplementary material as "Max Question Length", which is indicated as 19. There might have been a mixup with "Max Explanation Length" which is set to 51? I wonder if you also used a higher max_sequence_length on the other datasets.

Anyways, thank you for clearing this up!

Best Jakob