huggingface / accelerate

🚀 A simple way to launch, train, and use PyTorch models on almost any device and distributed configuration, automatic mixed precision (including fp8), and easy-to-configure FSDP and DeepSpeed support
https://huggingface.co/docs/accelerate
Apache License 2.0
7.98k stars 974 forks source link

deepspeed zero3 save model #3238

Closed Reginald-L closed 1 week ago

Reginald-L commented 1 week ago

Hi, I am using deepspeed zero3 to fine tune flux model using kohya scripts.

flux = accelerator.unwrap_model(flux)
print(f"flux - {flux.state_dict()['single_blocks.7.linear1.weight'].shape}")
print(f"flux - {flux.state_dict()['single_blocks.7.linear1.weight'].device}")

I got the below result: image

and When I save the trained model, I got this: image

Here is my zero config: image

BenjaminBossan commented 1 week ago

Could you try the with deepspeed.zero.GatheredParameters(params) context?

Reginald-L commented 1 week ago

Could you try the with deepspeed.zero.GatheredParameters(params) context?

Cool, thanks very much, your solution is pretty useful