zero3 train flux, accelerate unwrap_model got tensor(size(0))

kohya-ss / sd-scripts

Apache License 2.0

5.31k stars 880 forks source link

zero3 train flux, accelerate unwrap_model got tensor(size(0)) #1782

Open Reginald-L opened 1 week ago

Reginald-L commented 1 week ago

Hi, I am using deepspeed zero3 to fine tune flux model using the script - flux_train_network.py.

flux = accelerator.unwrap_model(flux)
print(f"flux - {flux.state_dict()['single_blocks.7.linear1.weight'].shape}")
print(f"flux - {flux.state_dict()['single_blocks.7.linear1.weight'].device}")

I got the below result:

and When I save the trained model, I got this:

Here is my zero config:

kohya-ss commented 3 days ago

DeepSpeed is not tested yet for FLUX.1 training. We plan to support this in the future, but it may take some time.