Open v4if opened 1 week ago
Accelerate's dataloader works very differently where we specifically do not duplicate data across ranks. See this visualization: https://www.youtube.com/watch?v=9Vfauv4ErwA&pp=ygUhaHVnZ2luZ2ZhY2UgYWNjZWxlcmF0ZSBkYXRhbG9hZGVy
System Info
Information
Tasks
no_trainer
script in theexamples
folder of thetransformers
repo (such asrun_no_trainer_glue.py
)Reproduction
run with
torchrun --nproc_per_node=2 --master_port=1234 sample.py
Expected behavior
During the model pre-training stage, the model needs to be parallelized. There will be tensor parallelism and sequence parallelism. For model parallelism, the same data should be sampled in the same data replica, so need to pass the sampler into the torch dataloader to ensure that the same data is generated in the same data replica. It is normal to use the torch dataloader test (test_torch_dataloader). In the case of world_size=2, data_parallel_size=1, and data_parallel_rank=0, the data on all ranks of each epoch is the same. But when using accelerator prepare test (test_accelerator_dataloader), the data obtained on the rank of each epoch is inconsistent.
torch dataloader: accelerator dataloader: