Hi, thanks for the interesting work.
I am confused about the batch size in paper. It says "We train with batch size 1 for 1M iterations".
Is there any reason not to increase the batch size? Maybe the GPU memory is the limit but I saw your cards in another issue.
And if it did 1, how long did you train your 1M iterations?
Thank you for any help.
Hi,
the reason is the memory, we used batch size 1. We opted for rather increasing the network size than slightly increasing the batch size. Training speed was roughly 200K-300K iterations per day.
Hi, thanks for the interesting work. I am confused about the batch size in paper. It says "We train with batch size 1 for 1M iterations". Is there any reason not to increase the batch size? Maybe the GPU memory is the limit but I saw your cards in another issue. And if it did 1, how long did you train your 1M iterations? Thank you for any help.