Open vatsal-kr opened 10 months ago
How exactly are you running the fine_tune.py
script for distributed training, and what type of distributed training do you want to achieve?
Opacus doesn't support model sharding using DeepSpeed or FSDP. It does support DDP, but you would still need for the model to fit in each individual GPU. Furthermore, the code needs to be adapted to use DDP through dp_transformers
. See the args.parallel_model
argument in https://github.com/microsoft/dp-transformers/blob/main/src/dp_transformers/dp_utils.py#L171.
Hello When I run the code with two GPUs, I get the following error
This works fine with a single GPU though. Any suggestions?