Closed EmilyAlsentzer closed 3 years ago
Think @sgugger can best answer here when he's back from holiday :-)
You misunderstand the flag model_parallel
, it's not there to enable the use of several GPUs as this is done automatically by the Trainer
(you have to set CUDA_VISIBLE_DEVICES
to just one GPU if you don't want the Trainer to use them all). That flag is there to split the model layers on the various GPUs available (only available for a few models).
Got it, I didn't realize that the Trainer automatically uses multiple GPUs if visible. Thanks!
Environment info
transformers
version: 4.1.1Who can help
Trainer: @sgugger
Information
Model I am using (Bert, XLNet ...):
BERT
The problem arises when using:
I'm running a model on a toy dataset with only 2 examples and a batch size of 2. In trainer,
num_examples
is 2, buttotal_train_batch_size
is 12 even though I do not have themodel_parallel
flag set toTrue
(Note I do have 6 GPUs available on the machine). This doesn't seem to impact my code becausetrain_dataset_is_sized=True
, but it seems strange.The tasks I am working on is:
toy classification jsonl dataset with 2 examples
To reproduce
I think that this line has an unnecessary
not
. Should this beif self.model_parallel
instead ofif not self.model_parallel
? Thanks!