levihsu / OOTDiffusion

Official implementation of OOTDiffusion: Outfitting Fusion based Latent Diffusion for Controllable Virtual Try-on
Other
5.8k stars 829 forks source link

torch.cuda.OutOfMemoryError: CUDA out of memory. #173

Open georgegeorgevan opened 7 months ago

georgegeorgevan commented 7 months ago

各位大神好,我分别用4090和A100 (40G版)运行这个项目,都遇到同样的报错,说是显存不够: text_config_dict is provided which will be used to initialize CLIPTextConfig. The value text_config["id2label"] will be overriden. text_config_dict is provided which will be used to initialize CLIPTextConfig. The value text_config["bos_token_id"] will be overriden. text_config_dict is provided which will be used to initialize CLIPTextConfig. The value text_config["eos_token_id"] will be overriden. 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:04<00:00, 4.62s/it] 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:06<00:00, 6.23s/it] Initial seed: 1536610237 0%| | 0/20 [00:00<?, ?it/s] Traceback (most recent call last): File "/root/autodl-tmp/.tr/ot/run/run_ootd.py", line 71, in images = model( File "/root/autodl-tmp/.tr/ot/ootd/inference_ootd_hd.py", line 121, in call images = self.pipe(prompt_embeds=prompt_embeds, File "/root/autodl-tmp/conda3/ot/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, kwargs) File "/root/autodl-tmp/.tr/ot/ootd/pipelines_ootd/pipeline_ootd.py", line 373, in call noise_pred = self.unet_vton( File "/root/autodl-tmp/conda3/ot/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, *kwargs) File "/root/autodl-tmp/.tr/ot/ootd/pipelines_ootd/unet_vton_2d_condition.py", line 1080, in forward sample, res_samples, spatial_attn_inputs, spatial_attn_idx = downsample_block( File "/root/autodl-tmp/conda3/ot/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(args, kwargs) File "/root/autodl-tmp/.tr/ot/ootd/pipelines_ootd/unet_vton_2d_blocks.py", line 1177, in forward hidden_states, spatial_attn_inputs, spatial_attn_idx = attn( File "/root/autodl-tmp/conda3/ot/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, kwargs) File "/root/autodl-tmp/.tr/ot/ootd/pipelines_ootd/transformer_vton_2d.py", line 383, in forward hidden_states, spatial_attn_inputs, spatial_attn_idx = block( File "/root/autodl-tmp/conda3/ot/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, *kwargs) File "/root/autodl-tmp/.tr/ot/ootd/pipelines_ootd/attention_vton.py", line 266, in forward attn_output = self.attn1( File "/root/autodl-tmp/conda3/ot/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(args, kwargs) File "/root/autodl-tmp/conda3/ot/lib/python3.10/site-packages/diffusers/models/attention_processor.py", line 522, in forward return self.processor( File "/root/autodl-tmp/conda3/ot/lib/python3.10/site-packages/diffusers/models/attention_processor.py", line 1231, in call hidden_states = F.scaled_dot_product_attention( torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 3.81 GiB (GPU 0; 39.59 GiB total capacity; 36.51 GiB already allocated; 1.81 GiB free; 37.26 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

请问可能是什么原因呢?还有,想问下大家,你们是用多大显存的GPU跑的?

BigManstrj commented 7 months ago

same problem

PraNavKumAr01 commented 7 months ago

I used to encounter this issue and it seems like loading and processing png files are computationally expensive, try using jpg or jpeg files and it should be sorted

ixarchakos commented 6 months ago

Despite the resizing in run_ootd.py, the model and cloth images need to be 768x1024. When using arbitrary image resolutions it causes CUDA out-of-memory errors

XinZhang0526 commented 6 months ago

Same problem with A100 -40G. I try to use Xformers to solve it. It works. But my question is why the authors can run it successfully without Xformers

Borismartirosyan commented 5 months ago

XinZhang0526

Can you please share your python version and requirements.txt, please? I installed xformers but encounter the same error. Thanks

Nomination-NRB commented 5 months ago

Same problem with A100 -40G. I try to use Xformers to solve it. It works. But my question is why the authors can run it successfully without Xformers

how to add xformers in code, could you show more specific code

nitinmukesh commented 4 months ago

@XinZhang0526 Please could you share pip list Need to see what version of xformers work

XinZhang0526 commented 4 months ago

@nitinmukesh @Borismartirosyan In fact, here is my version of xformers xformers == 0.0.22 torch ==1.13.1+cu116

unet_vton.enable_xformers_memory_efficient_attention() unet_garm.enable_xformers_memory_efficient_attention()

By the way, torch >= 2.0 is recommended.

2681248863 commented 1 month ago

其实这是我的 xformers 版本 xformers == 0.0.22 火炬 ==1.13.1+cu116

unet_vton.enable_xformers_memory_efficient_attention() unet_garm.enable_xformers_memory_efficient_attention()

顺便说一句,建议使用 torch >= 2.0。

请问这两行代码要添加到哪个文件中