Closed wenlai-lavine closed 2 years ago
Hi @lavine-lmu,
Your query needs to be asked at the https://github.com/microsoft/DeepSpeed side since you're not using the HF/DS integration but writing your own training loop.
Also, you don't need HfDeepSpeedConfig
unless you use ZeRO-3. That's the only time its functionality is used to tell from_pretrained
to load the model directly on gpus.
To give you hint, your dataloader is unware of DDP - so you either need to use a deepspeed dataloader or to code it properly for DDP. You can see how this is done in the HF Trainer here:
@stas00 Thanks, problem solved after I use the deepspeed
dataloader.
Hi, I am trying to using deepspeed to finetune a model, but it seems the data are not parallel during the deepspeed?
I have wrote a toy code to repro, using 100 sentences with a batch_size=4, so the dataloader size is 25 when using one GPU; when I try to using multi-gpus, the dataloader size is still 25, which means we still need to do the loop in 25 times. I mean that, when we are using multiple GPUs, shouldn't the data be parallel? such as using 5 GPU here, does it mean we only need to do the loop 5 times? not 25 times?
I've only recently started using deepspeed and do not familiar with deepspeed and sorry for the easy question, hope someone can give me some suggestions. @stas00
see as follows:
python test.py
when using one GPU.deepspeed --num_gpus=5 test.py
and settrain_batch_size
to 20 when using 5 GPUs.