xavihart / PDM-Pure

PDM-based Purifier
13 stars 0 forks source link

How much GPU memory is needed? #2

Open llstela opened 4 months ago

llstela commented 4 months ago

I tried 32GB and 24GB GPU to run your demo code but all failed with CUDA out of memory.

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 16.00 GiB (GPU 0; 31.74 GiB total capacity; 21.30 GiB already allocated; 9.10 GiB free; 21.46 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

xavihart commented 4 months ago

i used ~20G

xavihart commented 4 months ago

can you offer the complete output from the terminal?

llstela commented 4 months ago

can you offer the complete output from the terminal?

This is the output I tried on 3090 (24GB):

(base) root@a83b401f11b6:/gdata/cold1/shengxuhan/codes/AIGC/PDM-Pure# python pdm_pure.py --image demo/advdm/original.png --save_path demo/advdm/ --device 1  
FORCE_MEM_EFFICIENT_ATTN= 0 @UNET:QKVATTENTION
/opt/conda/lib/python3.9/site-packages/huggingface_hub/file_download.py:1125: FutureWarning: The `force_filename` parameter is deprecated as a new caching system, which keeps the filenames as they are on the Hub, is now in place.
  warnings.warn(
/opt/conda/lib/python3.9/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
  warnings.warn(
Keyword arguments {'token': None} are not expected by StableDiffusionUpscalePipeline and will be ignored.
Begin to purify demo/advdm/original.png----------
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 50/50 [00:18<00:00,  2.65it/s]
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 50/50 [00:15<00:00,  3.30it/s]
Traceback (most recent call last):
  File "/gdata/cold1/shengxuhan/codes/AIGC/PDM-Pure/pdm_pure.py", line 65, in <module>
    main()
  File "/gdata/cold1/shengxuhan/codes/AIGC/PDM-Pure/pdm_pure.py", line 41, in main
    result = style_transfer(
  File "/opt/conda/lib/python3.9/site-packages/deepfloyd_if/pipelines/style_transfer.py", line 123, in style_transfer
    _stageIII_generations, _meta = if_III.embeddings_to_image(**if_III_kwargs)
  File "/opt/conda/lib/python3.9/site-packages/deepfloyd_if/modules/stage_III_sd_x4.py", line 80, in embeddings_to_image
    images = self.model(**metadata).images
  File "/opt/conda/lib/python3.9/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/opt/conda/lib/python3.9/site-packages/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_upscale.py", line 727, in __call__
    image = self.vae.decode(latents).sample
  File "/opt/conda/lib/python3.9/site-packages/diffusers/models/autoencoder_kl.py", line 191, in decode
    decoded = self._decode(z).sample
  File "/opt/conda/lib/python3.9/site-packages/diffusers/models/autoencoder_kl.py", line 178, in _decode
    dec = self.decoder(z)
  File "/opt/conda/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/opt/conda/lib/python3.9/site-packages/diffusers/models/vae.py", line 233, in forward
    sample = self.mid_block(sample)
  File "/opt/conda/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/opt/conda/lib/python3.9/site-packages/diffusers/models/unet_2d_blocks.py", line 463, in forward
    hidden_states = attn(hidden_states)
  File "/opt/conda/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/opt/conda/lib/python3.9/site-packages/diffusers/models/attention.py", line 168, in forward
    torch.empty(
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 8.00 GiB (GPU 1; 23.69 GiB total capacity; 13.30 GiB already allocated; 7.71 GiB free; 15.27 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
yuangan commented 1 month ago

Hi, thank you for your excellent work. I'm facing a similar issue—running out of memory with the A40 GPU, which has 46GB of memory, when executing pdm_pure.py at a 512x512 resolution. Do you have any suggestions?

Update: Solved by using xformers: FORCE_MEM_EFFICIENT_ATTN=1 python xxx

llstela commented 3 weeks ago

Hi, thank you for your excellent work. I'm facing a similar issue—running out of memory with the A40 GPU, which has 46GB of memory, when executing pdm_pure.py at a 512x512 resolution. Do you have any suggestions?

Update: Solved by using xformers: FORCE_MEM_EFFICIENT_ATTN=1 python xxx

still not solved. I have given up.

xavihart commented 3 weeks ago

i think you need to use effective attn, otherwise it will too costful

yuangan commented 3 weeks ago

Hi, thank you for your excellent work. I'm facing a similar issue—running out of memory with the A40 GPU, which has 46GB of memory, when executing pdm_pure.py at a 512x512 resolution. Do you have any suggestions? Update: Solved by using xformers: FORCE_MEM_EFFICIENT_ATTN=1 python xxx

still not solved. I have given up.

pip install xformers==0.0.16 is needed. You can find it in DeepFloyd IF.