deepspeed zero3 save model

huggingface / accelerate

🚀 A simple way to launch, train, and use PyTorch models on almost any device and distributed configuration, automatic mixed precision (including fp8), and easy-to-configure FSDP and DeepSpeed support

https://huggingface.co/docs/accelerate

Apache License 2.0

7.98k stars 974 forks source link

deepspeed zero3 save model #3238

Closed Reginald-L closed 1 week ago

Reginald-L commented 1 week ago

Hi, I am using deepspeed zero3 to fine tune flux model using kohya scripts.

flux = accelerator.unwrap_model(flux)
print(f"flux - {flux.state_dict()['single_blocks.7.linear1.weight'].shape}")
print(f"flux - {flux.state_dict()['single_blocks.7.linear1.weight'].device}")

I got the below result:

and When I save the trained model, I got this:

Here is my zero config:

BenjaminBossan commented 1 week ago

Could you try the with deepspeed.zero.GatheredParameters(params) context?

Reginald-L commented 1 week ago

Could you try the with deepspeed.zero.GatheredParameters(params) context?

Cool， thanks very much， your solution is pretty useful