tencent-ailab / IP-Adapter

The image prompt adapter is designed to enable a pretrained text-to-image diffusion model to generate images with image prompt.
Apache License 2.0
4.52k stars 298 forks source link

Training vram memory usage? #216

Closed crj1998 closed 6 months ago

crj1998 commented 6 months ago

Thanks author for sharing this awasome answer, I have read all the issue about training and know that you use 32G 8v100-gpus, batch=8 to train, which means the 32GB vram is enough. Howere when i run tutorial_train_faceid.py with batch 8, it use almost double of 32 GB. So i wander is there something i miss? I use

torch==2.0.0+cu118
diffusers==0.22.1
accelerate==0.23.0

image

Ted-developer commented 6 months ago

I also have a similar issue. In multi-GPU training, there's a problem with the loss gradient updates. I have modified the parameters that the optimizer needs to update. However, the VRAM usage is indeed very high. I suspect that it might be necessary to enable enable_xformers_memory_efficient_attention, and perhaps add a dedicated XFormersAttnProcessor for the IP Adapter.

xiaohu2015 commented 6 months ago

I used deepspeed to train. also you can use xformers or torch2.0 attention processor

fengyang0317 commented 6 months ago

According to https://huggingface.co/docs/diffusers/optimization/torch2.0, memory-efficient attention is enabled by default in torch 2.0. Did you also turn on the offload_param and offload_optimizer to allow batch=8 in V100? Thanks.

xiaohu2015 commented 6 months ago

I don't use offload_param and offload_optimizer