Closed mscherrmann closed 10 months ago
Hey @FinTexIFB, where is your C4 data stored? Were there any other changes you made to the environment setup etc.?
Hi,
the C4 data is stored in the default location (./my-copy-c4). I did not change anything in the environment setup.
@mscherrmann Did you find a solution for this? Even I am getting the same issue
Hi,
I tried to replicate the mosaic-BERT training on the C4 dataset. I followed step by step your guidelines. The dataset preparation worked well. However, during BERT training with the main.py file, I got a CUDA out of memory error. I did not change any hyperparamters in the respective yaml (mosaic-bert-base-uncased.yaml), except for the path of the data. I trained the model on a 8*80 GB A100 GPU.
Here is the trace:
Furthermore, the training up to that point took quite long:
I am a bit confused as you said that a key feature of mosaic-BERT is its training speed. Do you have any idea what I am doing wrong?
Thank you in advance for your help!
Update: I saw that the out of memory issue occurs when the model is evaluated (after 2000 batches per default). I already tried to reduce the global_train_batch_size from 4096 to 2048, withour success.