a bug report - Githubissues

HXYNODE commented 3 years ago

tgc1997 commented 3 years ago

Thanks for your careful findings. In initialize function of the SoftAttention & GumbelAttention class, we did not use opt.att_size, we use att_size which will be assigned to opt.hidden_size. In fact, there is no opt.att_size in our initial implementation, we just use opt.hidden_size as att_size. We think this had little impact on the results, if you find anything different, you can report your findings here.

HXYNODE commented 3 years ago

thank u very much. i have modified the att_size from 512 to 1024 for msvd dataset and the result is as shown below: B@4 | METEOR | ROUGE_L | CIDEr 52.8 | 35.9 | 72.1 | 90.7 the above result suggests that the modification causes model performance deterioratation. it is noticeable that the result is not stable as u mentioned in ReadMe file and i do not reproduce it again due to time limit. i also reproduced the original work 3 times as the att_size equals hidden_size i.e. 512 for msvd and the results are as follows: B@4 | METEOR | ROUGE_L | CIDEr (1st) 52.0 | 35.8 | 72.0 | 89.1 B@4 | METEOR | ROUGE_L | CIDEr (2nd) 53.4 | 36.4 | 73.5 | 94.8 B@4 | METEOR | ROUGE_L | CIDEr (3rd) 55.7 | 37.2 | 73.7 | 101.0 the unstable results may be incurred by the initialization measures, is that true? the reults stability indeed is another noteworthy & interesting issue. some people perfer show the best results and others like the mean results. do u have some valuable suggestions on fairly result comparision.
i also find the best cider/meteor epoch and corresponding model weights saved in training stage sometimes is not the best ones. i mean, the evaluate results might be better when loading the model weights file of other epochs in evaluate stage. may be it is because the best cider/meteor epoch refers to the best/meteor epoch for validation data during training rather than the best ones for testing data, is it right?
well, i just want to confirm if the att_size parameter works before and share my findings as well as some inner-confused questions with u this time. i really appreciate ur patient reply, thanks again! looking forward to hearing from u ~

tgc1997 commented 3 years ago

If you want to get stable results, you can fix the seed for all random methods by using:

def setup_seed(seed):
    np.random.seed(seed)
    random.seed(seed)
    torch.manual_seed(seed)
    torch.cuda.manual_seed_all(seed)
    torch.backends.cudnn.deterministic = True
    torch.backends.cudnn.enabled = True

But maybe you should try several seeds to get a good result. The best cider/meteor model for validation split maybe not the best for test split.

HXYNODE commented 3 years ago

thanks for your valuable insights and detail introductions on model training phase. the project and its elegant code is so impressive for me. thanks again!

tgc1997 / RMN

a bug report #16