Closed scutfrank closed 2 years ago
Hi, the results of our models may be affected by the global batch size. You can just modify the number of GPUs to 1 but it may lead to a bit lower performance. Changing the learning rate or using gradient accumulation / fp16 can be helpful to alleviate the effects if you want to train the model with a single GPU.
As above