tgc1997 / RMN

IJCAI2020: Learning to Discretely Compose Reasoning Module Networks for Video Captioning
79 stars 12 forks source link

a bug report #16

Closed HXYNODE closed 3 years ago

HXYNODE commented 3 years ago

hi! Ganchao here is a bug report. i find out an error which might lead to inaccurate model reproduction or training results when i debug and reproduce the project. the att_size=1024 set in run command console does not work. the reason is as follows: despite the initial att_size parameter inside in initialize function of the SoftAttention & GumbelAttention class is opt.att_size(=1024), all of the att_size parameters within the reference instances actually are opt.hidden_size (=512 for msvd dataset / =1300 for msr-vtt dataset). related code lines: https://github.com/tgc1997/RMN/blob/14a9eff9a936030bcea104cb2c65f5378136cd87/models/RMN.py#L18 https://github.com/tgc1997/RMN/blob/14a9eff9a936030bcea104cb2c65f5378136cd87/models/RMN.py#L45 https://github.com/tgc1997/RMN/blob/14a9eff9a936030bcea104cb2c65f5378136cd87/models/RMN.py#L171 https://github.com/tgc1997/RMN/blob/14a9eff9a936030bcea104cb2c65f5378136cd87/models/RMN.py#L175 https://github.com/tgc1997/RMN/blob/14a9eff9a936030bcea104cb2c65f5378136cd87/models/RMN.py#L207 https://github.com/tgc1997/RMN/blob/14a9eff9a936030bcea104cb2c65f5378136cd87/models/RMN.py#L211 https://github.com/tgc1997/RMN/blob/14a9eff9a936030bcea104cb2c65f5378136cd87/models/RMN.py#L245 https://github.com/tgc1997/RMN/blob/14a9eff9a936030bcea104cb2c65f5378136cd87/models/RMN.py#L285 https://github.com/tgc1997/RMN/blob/14a9eff9a936030bcea104cb2c65f5378136cd87/models/RMN.py#L287 bug if this parameter att_size do work as our expected, the opt.hidden_size in the above code lines should be replaced by opt.att_size. is it right ? thanks!

tgc1997 commented 3 years ago

Thanks for your careful findings. In initialize function of the SoftAttention & GumbelAttention class, we did not use opt.att_size, we use att_size which will be assigned to opt.hidden_size. In fact, there is no opt.att_size in our initial implementation, we just use opt.hidden_size as att_size. We think this had little impact on the results, if you find anything different, you can report your findings here.

HXYNODE commented 3 years ago

thank u very much. i have modified the att_size from 512 to 1024 for msvd dataset and the result is as shown below: B@4 | METEOR | ROUGE_L | CIDEr 52.8 | 35.9 | 72.1 | 90.7 the above result suggests that the modification causes model performance deterioratation. it is noticeable that the result is not stable as u mentioned in ReadMe file and i do not reproduce it again due to time limit. i also reproduced the original work 3 times as the att_size equals hidden_size i.e. 512 for msvd and the results are as follows: B@4 | METEOR | ROUGE_L | CIDEr (1st) 52.0 | 35.8 | 72.0 | 89.1 B@4 | METEOR | ROUGE_L | CIDEr (2nd) 53.4 | 36.4 | 73.5 | 94.8 B@4 | METEOR | ROUGE_L | CIDEr (3rd) 55.7 | 37.2 | 73.7 | 101.0 the unstable results may be incurred by the initialization measures, is that true? the reults stability indeed is another noteworthy & interesting issue. some people perfer show the best results and others like the mean results. do u have some valuable suggestions on fairly result comparision.
i also find the best cider/meteor epoch and corresponding model weights saved in training stage sometimes is not the best ones. i mean, the evaluate results might be better when loading the model weights file of other epochs in evaluate stage. may be it is because the best cider/meteor epoch refers to the best/meteor epoch for validation data during training rather than the best ones for testing data, is it right?
well, i just want to confirm if the att_size parameter works before and share my findings as well as some inner-confused questions with u this time. i really appreciate ur patient reply, thanks again! looking forward to hearing from u ~

tgc1997 commented 3 years ago

If you want to get stable results, you can fix the seed for all random methods by using:

def setup_seed(seed):
    np.random.seed(seed)
    random.seed(seed)
    torch.manual_seed(seed)
    torch.cuda.manual_seed_all(seed)
    torch.backends.cudnn.deterministic = True
    torch.backends.cudnn.enabled = True

But maybe you should try several seeds to get a good result. The best cider/meteor model for validation split maybe not the best for test split.

HXYNODE commented 3 years ago

thanks for your valuable insights and detail introductions on model training phase. the project and its elegant code is so impressive for me. thanks again!