Open weiddeng opened 1 year ago
If you are running on colab you must take it as 4 , else i think on higher GPUs you can go upto 8. I tried to do this on colab pro and used 4 else i was getting OOM error,
I think it depends. I tried 16, but it worked.
gradient_accumulation_steps = batch_size // micro_batch_size
A stupid question, I think I know what batch_size is, but what is micro_batch_size and what it is for? Thanks!
As in