Closed crj1998 closed 6 months ago
I also have a similar issue. In multi-GPU training, there's a problem with the loss gradient updates. I have modified the parameters that the optimizer needs to update. However, the VRAM usage is indeed very high. I suspect that it might be necessary to enable enable_xformers_memory_efficient_attention, and perhaps add a dedicated XFormersAttnProcessor for the IP Adapter.
I used deepspeed to train. also you can use xformers or torch2.0 attention processor
According to https://huggingface.co/docs/diffusers/optimization/torch2.0, memory-efficient attention is enabled by default in torch 2.0. Did you also turn on the offload_param and offload_optimizer to allow batch=8 in V100? Thanks.
I don't use offload_param and offload_optimizer
Thanks author for sharing this awasome answer, I have read all the issue about training and know that you use 32G 8v100-gpus, batch=8 to train, which means the 32GB vram is enough. Howere when i run tutorial_train_faceid.py with batch 8, it use almost double of 32 GB. So i wander is there something i miss? I use