Hello, I think by following the suggested positions of scheduler in training loop, the first batch is counted twice : firstly at the top (where scheduler.step() because step() calls batch_step() which increments iteration) and again at scheduler.batch_step(). This leads to index out of bounds at batch_step() t_cur = self.t_epoch + self.batch_increments[self.iteration]
Resolve it by commenting out self.batch_step() in step(). This worked for me.
Did anyone make it work otherwise? My instatition params batch_size=16 and epoch_size=len(train_loader.dataset) = 1152
Hello, I think by following the suggested positions of scheduler in training loop, the first batch is counted twice : firstly at the top (where scheduler.step() because step() calls batch_step() which increments iteration) and again at scheduler.batch_step(). This leads to index out of bounds at batch_step() t_cur = self.t_epoch + self.batch_increments[self.iteration]
Resolve it by commenting out self.batch_step() in step(). This worked for me.
Did anyone make it work otherwise? My instatition params batch_size=16 and epoch_size=len(train_loader.dataset) = 1152