Open HenryHZY opened 1 week ago
By the way, the performance listed in the repo is different from that shown in the paper. Do you re-train the models?
Hi @HenryHZY, thanks for your interest in our work!
May I know if your training always produces poor results, even when running multiple times? The release configs should be strictly aligned with the settings for our checkpoints & the results in the paper, and the 0.0 LR for epoch 1 in our log was caused by the lower precision of logging. We observed that the performance (of almost all models) on TVSum is extremely unstable, as we only use 4 videos for training and 1 video for evaluation. The results on QVHighlights (test split) should be much more reliable for benchmarking.
By the way, the performance listed in the repo is different from that shown in the paper. Do you re-train the models?
Yes. We re-trained all the models (with the same settings in the paper) before releasing the code, so that the numbers are slightly different.
Hi @HenryHZY, thanks for your interest in our work!
May I know if your training always produces poor results, even when running multiple times? The release configs should be strictly aligned with the settings for our checkpoints & the results in the paper, and the 0.0 LR for epoch 1 in our log was caused by the lower precision of logging. We observed that the performance (of almost all models) on TVSum is extremely unstable, as we only use 4 videos for training and 1 video for evaluation. The results on QVHighlights (test split) should be much more reliable for benchmarking.
Thanks for your quick response! I will try it later.
Hi @yeliudev , thanks for your great project!
First, I use your provided checkpoint and get the same result as the provided log.
Then, I just reproduce the training of TVSum-PK:
However, there is a significant performance difference in my training log, compared to the provided log (https://huggingface.co/yeliudev/R2-Tuning/resolve/main/checkpoints/r2_tuning_tvsum_pk.log):
Apart from the random seed, I think the main difference is lr schedule. For example, my LR in Epoch 1 is 5e-07, while in the provided log, the LR in Epoch 1 is 0.0.
Do you know how to fix this training issue and have a normal reproduction? Thank you!