neuralmind-ai / portuguese-bert

Portuguese pre-trained BERT models
Other
792 stars 122 forks source link

Preferred hardware configuration for batch size = 16 #42

Closed monilouise closed 2 years ago

monilouise commented 2 years ago

Hi, Your paper says you used a batch size = 16 to train the NER, but the example commands use only 2 for batch size. I manage to train it with batch size = 2 in a 8GB GTX 1080 GPU, but I can't run it with batch size = 16, even in a Google Colab TPU environment.

Did you use any other special on premise hardware configuration? (e.g: a TPU cluster)

monilouise commented 2 years ago

Sorry, I've just read the following explanation on the repository:

_"The commands below set the batch size to 16 considering a BERT Base model and an 8GB GPU. The parameters per_gpu_train_batch_size and gradient_accumulation_steps can be changed to use less or more available memory and produce the same results, as long as per_gpu_train_batch_size * gradient_accumulationsteps == 16."

So in fact does the default command use batch size = 16? The name "per_gpu_train_batch_size" leads to thinks it means the batch size...

fabiocapsouza commented 2 years ago

Yes, the example commands set an effective batch size of 16. The batch size argument names are a bit confusing because the effective batch size is the product of 1) the number of GPUs, 2) the instantaneous batch size per GPU (per_gpu_train_batch_size), and 3) the gradient accumulation steps. These arguments can be tweaked to your hardware specs.