Closed otpas007 closed 3 years ago
Can you please provide the multi-gpu version of the code?
You can use larger gradient accumulation steps and a smaller batch size if you have an OOM issue. I don't think multi-gpu is necessary.
Can you please provide the multi-gpu version of the code?