Describe the bug
For multi-GPU training, the number of batches per epoch does not reduce by the same factor as the number of GPUs.
To Reproduce
For the configuration below, when using a dataset with 1 million samples and 4 GPUs, the number of batches (as obtained from the training dataloader length) is 62,500 (=1M/16) instead of 250,000 (=1M/4).
"train_batch_size": 4, "train_micro_batch_size_per_gpu": 1, "gradient_accumulation_steps": 1,
Expected behavior
The number of batches for a multi-GPU setting should be (training data size )/ num_gpus, but it is not
Describe the bug For multi-GPU training, the number of batches per epoch does not reduce by the same factor as the number of GPUs.
To Reproduce For the configuration below, when using a dataset with 1 million samples and 4 GPUs, the number of batches (as obtained from the training dataloader length) is 62,500 (=1M/16) instead of 250,000 (=1M/4).
"train_batch_size": 4, "train_micro_batch_size_per_gpu": 1, "gradient_accumulation_steps": 1,
Expected behavior The number of batches for a multi-GPU setting should be (training data size )/ num_gpus, but it is not