Open yashu-seth opened 6 years ago
I spotted the same problem. Is it possible to have any clarification about it? Thanks
Hi! Paper has a typo, instead of taking the last n intervals, they are masked and we compare against the remaining set. Check #13 for confirmation. Hope this helps.
@angeliand Thanks a lot!!
Hello Everyone,
I went through the paper Regularizing and Optimizing LSTM Language Models. In the algorithm (NT-ASGD) discussed in the paper, for the switching to take place, current validation loss should be greater than the last n intervals, where n is the non-monotone interval.
While in code the implementation suggests that current validation loss should be greater that all but the last n intervals.
I think the code condition should be
min(best_val_loss[-args.nonmono:]))
instead ofmin(best_val_loss[:-args.nonmono]))
.Please correct me if am missing something.
Thanks.