Closed Nipi64310 closed 1 year ago
cc @patil-suraj @williamberman @pcuenca
I don't really know much about deepspeed maybe @williamberman knows more here. As far as I have tried, the script works well with stage 2, using CPU offloading which should help fit the model on 2080ti.
I don't know off the top of my head. Someone opened an issue with the same error message using deepspeed stage 3 with transformers. https://github.com/microsoft/DeepSpeed/issues/2746 Maybe best to see if the deepspeed team knows before digging in :)
Fixed in https://github.com/huggingface/diffusers/pull/3076 (but zero-3 support is only partial)
Please carefully read the OP of the PR for details.
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
Describe the bug
An error is reported when using deepspeed's zero stage3 finetune diffusers/examples/text_to_image/train_text_to_image.py script. My machine's GPU is 4*2080ti, and because a single GPU cannot accommodate all SD2 parameters, the deepspeed zero stage3 strategy must be used.
Reproduction
accelerate.yaml
/home/kas/zero_stage3_offload_config.json
launch script
Logs
System Info
diffusers
version: 0.11.1