Switching criteria with non-monotone interval has difference in paper and code?

yashu-seth commented 6 years ago

Hello Everyone,

I went through the paper Regularizing and Optimizing LSTM Language Models. In the algorithm (NT-ASGD) discussed in the paper, for the switching to take place, current validation loss should be greater than the last n intervals, where n is the non-monotone interval.

While in code the implementation suggests that current validation loss should be greater that all but the last n intervals.

if args.optimizer == 'sgd' and 't0' not in optimizer.param_groups[0] and
 (len(best_val_loss)>args.nonmono and val_loss > min(best_val_loss[:-args.nonmono])):
                print('Switching to ASGD')
                optimizer = torch.optim.ASGD(model.parameters(), lr=args.lr, t0=0, lambd=0., weight_decay=args.wdecay)

I think the code condition should be min(best_val_loss[-args.nonmono:])) instead of min(best_val_loss[:-args.nonmono])) .

Please correct me if am missing something.

Thanks.

christian-5-28 commented 6 years ago

I spotted the same problem. Is it possible to have any clarification about it? Thanks

angeliand commented 5 years ago

Hi! Paper has a typo, instead of taking the last n intervals, they are masked and we compare against the remaining set. Check #13 for confirmation. Hope this helps.

yashu-seth commented 5 years ago

@angeliand Thanks a lot!!

salesforce / awd-lstm-lm

Switching criteria with non-monotone interval has difference in paper and code? #73