Open Kashu7100 opened 3 years ago
Hi, in my training, I found the RL cannot improve the pragraph generation as significantly as traditional video captioning. It will increase the scores in the first several epochs and then decrease. The following is my re-training log. log.txt
Thank you for the reply. Do you think this is due to the model architecture (compared to the ordinary two stage models) or some implementation? I will also take a look at the rl part of the code.
[issue] The fine-tuning step doesn't increase the scores (it even decreases the score). Please refer to the green line in the chart below.
[How to reproduce] I have trained the provided code on ActivityNet dataset. I have followed the data preparation and training instructions of the README.
[questions] Is this the same behavior observed during your training?