huggingface / transformers

šŸ¤— Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
https://huggingface.co/transformers
Apache License 2.0
134.49k stars 26.9k forks source link

Possible bug in `train_batch_size` #9340

Closed EmilyAlsentzer closed 3 years ago

EmilyAlsentzer commented 3 years ago

Environment info

Who can help

Trainer: @sgugger

Information

Model I am using (Bert, XLNet ...):

BERT

The problem arises when using:

I'm running a model on a toy dataset with only 2 examples and a batch size of 2. In trainer, num_examples is 2, but total_train_batch_size is 12 even though I do not have the model_parallel flag set to True (Note I do have 6 GPUs available on the machine). This doesn't seem to impact my code because train_dataset_is_sized=True, but it seems strange.

The tasks I am working on is:

toy classification jsonl dataset with 2 examples

To reproduce

I think that this line has an unnecessary not. Should this be if self.model_parallel instead of if not self.model_parallel? Thanks!

patrickvonplaten commented 3 years ago

Think @sgugger can best answer here when he's back from holiday :-)

sgugger commented 3 years ago

You misunderstand the flag model_parallel, it's not there to enable the use of several GPUs as this is done automatically by the Trainer (you have to set CUDA_VISIBLE_DEVICES to just one GPU if you don't want the Trainer to use them all). That flag is there to split the model layers on the various GPUs available (only available for a few models).

EmilyAlsentzer commented 3 years ago

Got it, I didn't realize that the Trainer automatically uses multiple GPUs if visible. Thanks!