Open michelwi opened 2 weeks ago
This would be backend dependent. I guess you are fine if this is only for PyTorch now?
Btw, also note, I don't really expect that much more throughput with this. It will also increase the amount of zero padding. E.g. when I was doing beam search, at a certain batch size, it became slower because it anyway couldn't really calculate more in parallel. E.g. for a matmul, the things it can calc in parallel is batch size beam size dimension, and if this number is already more than the number of CUDA threads (which is in the order of thousands), then you cannot really gain more in speed by increasing the batch size. However, having more zero padding will degrade the throughput.
There is currently one batch_size defined for both training and cross validation. Since we do not have to keep track of gradients etc during CV, we could increase the batch_size for higher throughput.