Closed nrailg closed 8 hours ago
@lewtun @stevhliu @BenjaminBossan @stas00 @muellerzr
I saw your commits in git history, so maybe I @ ed you , maybe you can kindly help?
I believe using multiple model and zero-3 in diffusion training is quite important (for larger model like FLUX), it's a critical ability of Accelerate.
@muellerzr Could you please kindly take some time and have a look at this case? Thank you!
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
System Info
Information
Tasks
no_trainer
script in theexamples
folder of thetransformers
repo (such asrun_no_trainer_glue.py
)Reproduction
Code below copied from https://github.com/huggingface/diffusers/blob/main/examples/text_to_image/train_text_to_image.py
I made several modifications according to https://github.com/huggingface/accelerate/blob/main/docs/source/usage_guides/deepspeed_multiple_model.md
My configs
script/train-t2i-sd-ds-config.json
script/train-t2i-sd-ds-config-infer-only.json
How I ran the code
Expected behavior
What I got
If i change zero-3 to zero-1 in
script/train-t2i-sd-ds-config.json
, it works well.