Closed wujianP closed 1 year ago
Hello!
This could be a typo that comes from when we switched to a100s in the camera-ready version of the paper. You should be able to reproduce most of the results even with a smaller batch size, what you might lose is a bit of generalization power. If possible, use a larger GPU.
Hello, I'm curious about how to train with a batch size of 1024 on a single 2080Ti. Because when I use a batch size of 256 on a 32GB V100, it consumes 28GB of GPU memory. Do I miss any details?