floor() fixes a problem with dataset length not being divisible by the number of steps (resulted in batch size = dataset length and lots of CUDA memory issues for me!)
I accidentally made my original PR using annealing from high LR to low LR. It should be the other way around. Now fixed.
floor() fixes a problem with dataset length not being divisible by the number of steps (resulted in batch size = dataset length and lots of CUDA memory issues for me!)
I accidentally made my original PR using annealing from high LR to low LR. It should be the other way around. Now fixed.