tgc1997 / RMN

IJCAI2020: Learning to Discretely Compose Reasoning Module Networks for Video Captioning
79 stars 12 forks source link

A refinement report #24

Open HXYNODE opened 2 years ago

HXYNODE commented 2 years ago

https://github.com/tgc1997/RMN/blob/14a9eff9a936030bcea104cb2c65f5378136cd87/train.py#L128 Hi, Ganchao. I found the above judgement may miss some conditions during executing the project. e.g. When the train_batch_size is set to 2 or 3, the step of the train_loader is 24390 (48779/2=24389.5) and 16260 (48779/3=16259.67) respectively. Here 48779 is the total number of samples for MSVD dataset. Note that the division operation is not completed. It means there are only 1 or 2 samples in the 24390th or 16260th step. And it doesn't meet the condition, bsz == opt.train_batch_szie. so the loss_count will be divided by 0 (i % 10). Ooops! : (
It could be refined like followings:

if bsz == opt.train_batch_size:
    loss_count /= 10
elif bsz < opt.train_batch_size and i % 10 == 0:
    loss_count /= 10
else:
    loss_count /= i % 10

The project on my server restart again now. If it still works well after executing one epoch, I will come back to report.