Make batch_size configurable for cross validation

This would be backend dependent. I guess you are fine if this is only for PyTorch now?

Btw, also note, I don't really expect that much more throughput with this. It will also increase the amount of zero padding. E.g. when I was doing beam search, at a certain batch size, it became slower because it anyway couldn't really calculate more in parallel. E.g. for a matmul, the things it can calc in parallel is batch size beam size dimension, and if this number is already more than the number of CUDA threads (which is in the order of thousands), then you cannot really gain more in speed by increasing the batch size. However, having more zero padding will degrade the throughput.

rwth-i6 / returnn

Make batch_size configurable for cross validation #1567