Open max-fofanov opened 6 months ago
I would think that would be the case because them modules are loaded and offloaded to CPU as in when needed. Ccing @pcuenca @SunMarc too as they might have additional insights into this.
I should also mention, that after the first few calls the RAM starts raising very slowly, but still fails after 30-40 calls
On a shared machine and A100, I get this:
(diffusers) sayak@hf-dgx-01:~/diffusers$ CUDA_VISIBLE_DEVICES=2 python test_mco.py
Before loading pipeline - Memory usage: 254.10 GB
Loading pipeline components...: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 7/7 [00:01<00:00, 6.05it/s]
After loading pipeline - Memory usage: 253.66 GB
After moving model to CPU - Memory usage: 253.66 GB
After generating 1 - Memory usage: 261.04 GB
After generating 2 - Memory usage: 263.83 GB
After generating 3 - Memory usage: 264.49 GB
After generating 4 - Memory usage: 265.72 GB
After generating 5 - Memory usage: 98.67 GB
After generating 6 - Memory usage: 57.82 GB
After generating 7 - Memory usage: 57.52 GB
After generating 8 - Memory usage: 60.57 GB
After generating 9 - Memory usage: 59.72 GB
After generating 10 - Memory usage: 60.33 GB
After generating 11 - Memory usage: 60.98 GB
After generating 12 - Memory usage: 58.98 GB
After generating 13 - Memory usage: 60.24 GB
After generating 14 - Memory usage: 60.52 GB
After generating 15 - Memory usage: 61.35 GB
After generating 16 - Memory usage: 60.92 GB
After generating 17 - Memory usage: 60.86 GB
After generating 18 - Memory usage: 60.40 GB
After generating 19 - Memory usage: 60.92 GB
After generating 20 - Memory usage: 61.55 GB
After generating 21 - Memory usage: 62.41 GB
After generating 22 - Memory usage: 64.44 GB
After generating 23 - Memory usage: 63.87 GB
After generating 24 - Memory usage: 64.40 GB
After generating 25 - Memory usage: 64.65 GB
After generating 26 - Memory usage: 61.38 GB
After generating 27 - Memory usage: 61.91 GB
After generating 28 - Memory usage: 61.99 GB
After generating 29 - Memory usage: 62.49 GB
After generating 30 - Memory usage: 63.00 GB
After generating 31 - Memory usage: 61.95 GB
After generating 32 - Memory usage: 61.87 GB
After generating 33 - Memory usage: 62.37 GB
After generating 34 - Memory usage: 60.64 GB
After generating 35 - Memory usage: 65.77 GB
After generating 36 - Memory usage: 65.86 GB
After generating 37 - Memory usage: 65.34 GB
After generating 38 - Memory usage: 63.55 GB
After generating 39 - Memory usage: 62.52 GB
After generating 40 - Memory usage: 62.46 GB
After generating 41 - Memory usage: 61.85 GB
After generating 42 - Memory usage: 62.65 GB
After generating 43 - Memory usage: 64.70 GB
After generating 44 - Memory usage: 63.54 GB
After generating 45 - Memory usage: 61.94 GB
After generating 46 - Memory usage: 61.52 GB
After generating 47 - Memory usage: 62.03 GB
After generating 48 - Memory usage: 63.48 GB
After generating 49 - Memory usage: 66.18 GB
After generating 50 - Memory usage: 61.92 GB
After deleting pipeline - Memory usage: 61.55 GB
After inference - Memory usage: 61.56 GB
I extended the number of runs to 50 to get a more reasonable estimate and also commented the to("cpu")
call (not sure why it's there). The numbers seem reasonable to me. The initial spike that we see in the logs above could very likely be because of the shared nature of the machine I am using.
Also, after deleting pipe not all memory is freed
This is interesting. The increase is small but nonetheless it's there and it has no reason to be there.
The numbers seem reasonable to me.
So this is intended behavior? I can see, that you've also experienced an increase, and although it looks small on A100, this is very painful when using a single T4 and 16 GB RAM
I tried in Colab T4 free. ~1 GB fluctuation during 50 inference :thinking::
Before loading pipeline - Memory usage: 1.30 GB
/usr/local/lib/python3.10/dist-packages/huggingface_hub/utils/_token.py:88: UserWarning:
The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.
warnings.warn(
Loadingβpipelineβcomponents...:β100%
β7/7β[00:02<00:00,ββ1.77it/s]
After loading pipeline - Memory usage: 1.46 GB
100%
β10/10β[00:32<00:00,ββ1.12s/it]
After generating 1 - Memory usage: 8.51 GB
100%
β10/10β[00:09<00:00,ββ1.17it/s]
After generating 2 - Memory usage: 9.16 GB
100%
β10/10β[00:10<00:00,ββ1.13it/s]
After generating 3 - Memory usage: 9.12 GB
100%
β10/10β[00:10<00:00,ββ1.11it/s]
After generating 4 - Memory usage: 9.14 GB
100%
β10/10β[00:10<00:00,ββ1.08it/s]
After generating 5 - Memory usage: 9.11 GB
100%
β10/10β[00:10<00:00,ββ1.04it/s]
After generating 6 - Memory usage: 9.12 GB
100%
β10/10β[00:10<00:00,ββ1.02it/s]
After generating 7 - Memory usage: 9.16 GB
100%
β10/10β[00:10<00:00,ββ1.06it/s]
After generating 8 - Memory usage: 9.18 GB
100%
β10/10β[00:10<00:00,ββ1.07it/s]
After generating 9 - Memory usage: 9.20 GB
100%
β10/10β[00:10<00:00,ββ1.07it/s]
After generating 10 - Memory usage: 9.09 GB
100%
β10/10β[00:10<00:00,ββ1.06it/s]
After generating 11 - Memory usage: 9.16 GB
100%
β10/10β[00:10<00:00,ββ1.04it/s]
After generating 12 - Memory usage: 9.14 GB
100%
β10/10β[00:10<00:00,ββ1.06it/s]
After generating 13 - Memory usage: 9.23 GB
100%
β10/10β[00:10<00:00,ββ1.06it/s]
After generating 14 - Memory usage: 9.11 GB
100%
β10/10β[00:10<00:00,ββ1.06it/s]
After generating 15 - Memory usage: 9.12 GB
100%
β10/10β[00:10<00:00,ββ1.06it/s]
After generating 16 - Memory usage: 9.15 GB
100%
β10/10β[00:10<00:00,ββ1.06it/s]
After generating 17 - Memory usage: 9.26 GB
100%
β10/10β[00:10<00:00,ββ1.06it/s]
After generating 18 - Memory usage: 9.25 GB
100%
β10/10β[00:10<00:00,ββ1.06it/s]
After generating 19 - Memory usage: 9.27 GB
100%
β10/10β[00:10<00:00,ββ1.06it/s]
After generating 20 - Memory usage: 9.28 GB
100%
β10/10β[00:10<00:00,ββ1.06it/s]
After generating 21 - Memory usage: 9.35 GB
100%
β10/10β[00:10<00:00,ββ1.06it/s]
After generating 22 - Memory usage: 9.20 GB
100%
β10/10β[00:10<00:00,ββ1.06it/s]
After generating 23 - Memory usage: 9.23 GB
100%
β10/10β[00:10<00:00,ββ1.06it/s]
After generating 24 - Memory usage: 9.40 GB
100%
β10/10β[00:10<00:00,ββ1.06it/s]
After generating 25 - Memory usage: 9.23 GB
100%
β10/10β[00:10<00:00,ββ1.06it/s]
After generating 26 - Memory usage: 9.23 GB
100%
β10/10β[00:10<00:00,ββ1.06it/s]
After generating 27 - Memory usage: 9.32 GB
100%
β10/10β[00:10<00:00,ββ1.06it/s]
After generating 28 - Memory usage: 9.34 GB
100%
β10/10β[00:10<00:00,ββ1.06it/s]
After generating 29 - Memory usage: 9.23 GB
100%
β10/10β[00:10<00:00,ββ1.06it/s]
After generating 30 - Memory usage: 9.28 GB
100%
β10/10β[00:10<00:00,ββ1.06it/s]
After generating 31 - Memory usage: 9.17 GB
100%
β10/10β[00:10<00:00,ββ1.06it/s]
After generating 32 - Memory usage: 9.24 GB
100%
β10/10β[00:10<00:00,ββ1.06it/s]
After generating 33 - Memory usage: 9.33 GB
100%
β10/10β[00:10<00:00,ββ1.06it/s]
After generating 34 - Memory usage: 9.24 GB
100%
β10/10β[00:10<00:00,ββ1.06it/s]
After generating 35 - Memory usage: 9.29 GB
100%
β10/10β[00:10<00:00,ββ1.06it/s]
After generating 36 - Memory usage: 9.27 GB
100%
β10/10β[00:10<00:00,ββ1.06it/s]
After generating 37 - Memory usage: 9.46 GB
100%
β10/10β[00:10<00:00,ββ1.06it/s]
After generating 38 - Memory usage: 9.39 GB
100%
β10/10β[00:10<00:00,ββ1.06it/s]
After generating 39 - Memory usage: 9.29 GB
100%
β10/10β[00:10<00:00,ββ1.06it/s]
After generating 40 - Memory usage: 9.27 GB
100%
β10/10β[00:10<00:00,ββ1.06it/s]
After generating 41 - Memory usage: 9.30 GB
100%
β10/10β[00:10<00:00,ββ1.06it/s]
After generating 42 - Memory usage: 9.29 GB
100%
β10/10β[00:10<00:00,ββ1.06it/s]
After generating 43 - Memory usage: 9.19 GB
100%
β10/10β[00:10<00:00,ββ1.06it/s]
After generating 44 - Memory usage: 9.21 GB
100%
β10/10β[00:10<00:00,ββ1.06it/s]
After generating 45 - Memory usage: 9.32 GB
100%
β10/10β[00:10<00:00,ββ1.06it/s]
After generating 46 - Memory usage: 9.32 GB
100%
β10/10β[00:10<00:00,ββ1.06it/s]
After generating 47 - Memory usage: 9.30 GB
100%
β10/10β[00:10<00:00,ββ1.06it/s]
After generating 48 - Memory usage: 9.24 GB
100%
β10/10β[00:10<00:00,ββ1.06it/s]
After generating 49 - Memory usage: 9.24 GB
100%
β10/10β[00:10<00:00,ββ1.06it/s]
After generating 50 - Memory usage: 9.38 GB
After deleting pipeline - Memory usage: 8.92 GB
After inference - Memory usage: 8.94 GB
Also, I couldn't clear all the RAM either at the end.
I couldn't clear all the RAM either at the end.
Is there any workaround for that by any chance?
I couldn't clear all the RAM either at the end.
Is there any workaround for that by any chance?
As a workaround you can try this before torch.cuda.empty_cache()
:
import ctypes
libc = ctypes.CDLL("libc.so.6")
libc.malloc_trim(0)
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
Describe the bug
When using enable_model_cpu_offload on StableDiffusionXLPipeline, each consecutive call takes more and more RAM. Also, after deleting pipe not all memory is freed
Reproduction
Logs
System Info
Who can help?
@yiyixuxu @sayakpaul