Closed jhemley closed 5 months ago
As you can see in https://github.com/lllyasviel/Fooocus/blob/main/ldm_patched/modules/model_management.py#L429-L430 the trigger for lowvram mode is model_size > (current_free_mem - inference_memory)
.
Please check the model size and debug the other parameters in given code by adding a breakpoint and using python debugger or by prompting the values to further debug. Thanks!
Is there anyway to simply force the model because i belive i have enough vram. I tried the force normal and high vram but they didnt work.
Please check the model size and debug the other parameters in given code by adding a breakpoint and using python debugger or by prompting the values to further debug. Thanks!
Please debug this yourself and provide further information.
I think i located the problem. i think the telsa driver limits the vram usage to 8102 MiB instead of the 8192 on the card. I found this by disabling the lowvram mode by changing the param to this model_size > (99999999999999). now it outputs this python entry_with_update.py --listen Already up-to-date Update succeeded. [System ARGV] ['entry_with_update.py', '--listen'] Python 3.10.14 (main, Mar 21 2024, 16:24:04) [GCC 11.2.0] Fooocus version: 2.3.1 [Cleanup] Attempting to delete content of temp dir /tmp/fooocus [Cleanup] Cleanup successful Total VRAM 8123 MB, total RAM 32100 MB Set vram state to: NORMAL_VRAM Always offload VRAM Device: cuda:0 Tesla M60 : native VAE dtype: torch.float32 Using pytorch cross attention Refiner unloaded. Running on local URL: http://0.0.0.0:7865
Thanks for being a Gradio user! If you have questions or feedback, please join our Discord server and chat with us: https://discord.gg/feTf9x3ZSB
To create a public link, set share=True
in launch()
.
model_type EPS
UNet ADM Dimension 2816
Using pytorch attention in VAE
Working with z of shape (1, 4, 32, 32) = 4096 dimensions.
Using pytorch attention in VAE
extra {'cond_stage_model.clip_l.text_projection', 'cond_stage_model.clip_g.transformer.text_model.embeddings.position_ids', 'cond_stage_model.clip_l.logit_scale'}
Base model loaded: /home/jared/Fooocus/models/checkpoints/juggernautXL_v8Rundiffusion.safetensors
Request to load LoRAs [['sd_xl_offset_example-lora_1.0.safetensors', 0.1], ['None', 1.0], ['None', 1.0], ['None', 1.0], ['None', 1.0]] for model [/home/jared/Fooocus/models/checkpoints/juggernautXL_v8Rundiffusion.safetensors].
Loaded LoRA [/home/jared/Fooocus/models/loras/sd_xl_offset_example-lora_1.0.safetensors] for UNet [/home/jared/Fooocus/models/checkpoints/juggernautXL_v8Rundiffusion.safetensors] with 788 keys at weight 0.1.
Fooocus V2 Expansion: Vocab with 642 words.
Fooocus Expansion engine loaded for cpu, use_fp16 = False.
Requested to load SDXLClipModel
Requested to load GPT2LMHeadModel
Loading 2 new models
Started worker with PID 1658
App started successful. Use the app with http://localhost:7865/ or 0.0.0.0:7865
[Parameters] Adaptive CFG = 7
[Parameters] Sharpness = 2
[Parameters] ControlNet Softness = 0.25
[Parameters] ADM Scale = 1.5 : 0.8 : 0.3
[Parameters] CFG = 4.0
[Parameters] Seed = 291429156536229784
[Parameters] Sampler = dpmpp_2m_sde_gpu - karras
[Parameters] Steps = 30 - 15
[Fooocus] Initializing ...
[Fooocus] Loading models ...
Refiner unloaded.
[Fooocus] Processing prompts ...
[Fooocus] Preparing Fooocus text #1 ...
[Prompt Expansion] make a car, bright colors, elegant, highly detailed, sharp focus, beautiful, intricate, cinematic, new classic, sunny, shining, deep aesthetic, appealing, artistic, fine detail, awesome color, dynamic light, great composition, clear professional background, creative, innocent, scenic, positive, unique, attractive, cute, perfect, focused, vibrant, epic, best
[Fooocus] Preparing Fooocus text #2 ...
[Prompt Expansion] make a car, expressive, dynamic composition, dramatic, elegant, highly detailed, sharp focus, beautiful, perfect light, attractive, innocent, divine, sublime, epic, stunning, inspired, vibrant, intricate, brilliant, thought, cinematic, background, illuminated, professional, best, creative, winning, romantic, fantastic, scenic, artistic, fabulous, bright, hopeful, cute
[Fooocus] Encoding positive #1 ...
[Fooocus] Encoding positive #2 ...
[Fooocus] Encoding negative #1 ...
[Fooocus] Encoding negative #2 ...
[Parameters] Denoising Strength = 1.0
[Parameters] Initial Latent shape: Image Space (896, 1152)
Preparation time: 12.90 seconds
[Sampler] refiner_swap_method = joint
[Sampler] sigma_min = 0.0291671771556139, sigma_max = 14.614643096923828
Requested to load SDXL
Loading 1 new model
ERROR diffusion_model.output_blocks.1.1.transformer_blocks.9.ff.net.0.proj.weight CUDA out of memory. Tried to allocate 50.00 MiB. GPU 0 has a total capacity of 7.93 GiB of which 17.62 MiB is free. Including non-PyTorch memory, this process has 7.91 GiB memory in use. Of the allocated memory 7.53 GiB is allocated by PyTorch, and 306.44 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)
Traceback (most recent call last):
File "/home/jared/Fooocus/modules/async_worker.py", line 913, in worker
handler(task)
File "/home/jared/anaconda3/envs/fooocus/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, kwargs)
File "/home/jared/anaconda3/envs/fooocus/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, *kwargs)
File "/home/jared/Fooocus/modules/async_worker.py", line 816, in handler
imgs = pipeline.process_diffusion(
File "/home/jared/anaconda3/envs/fooocus/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(args, kwargs)
File "/home/jared/anaconda3/envs/fooocus/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, kwargs)
File "/home/jared/Fooocus/modules/default_pipeline.py", line 362, in process_diffusion
sampled_latent = core.ksampler(
File "/home/jared/anaconda3/envs/fooocus/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, *kwargs)
File "/home/jared/anaconda3/envs/fooocus/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(args, kwargs)
File "/home/jared/Fooocus/modules/core.py", line 308, in ksampler
samples = ldm_patched.modules.sample.sample(model,
File "/home/jared/Fooocus/ldm_patched/modules/sample.py", line 93, in sample
real_model, positive_copy, negative_copy, noise_mask, models = prepare_sampling(model, noise.shape, positive, negative, noise_mask)
File "/home/jared/Fooocus/ldm_patched/modules/sample.py", line 86, in prepare_sampling
ldm_patched.modules.model_management.load_models_gpu([model] + models, model.memory_required([noise_shape[0] 2] + list(noise_shape[1:])) + inference_memory)
File "/home/jared/Fooocus/modules/patch.py", line 447, in patched_load_models_gpu
y = ldm_patched.modules.model_management.load_models_gpu_origin(args, **kwargs)
File "/home/jared/Fooocus/ldm_patched/modules/model_management.py", line 437, in load_models_gpu
cur_loaded_model = loaded_model.model_load(lowvram_model_memory)
File "/home/jared/Fooocus/ldm_patched/modules/model_management.py", line 304, in model_load
raise e
File "/home/jared/Fooocus/ldm_patched/modules/model_management.py", line 300, in model_load
self.real_model = self.model.patch_model(device_to=patch_model_to) #TODO: do something with loras and offloading to CPU
File "/home/jared/Fooocus/ldm_patched/modules/model_patcher.py", line 199, in patch_model
temp_weight = ldm_patched.modules.model_management.cast_to_device(weight, device_to, torch.float32, copy=True)
File "/home/jared/Fooocus/ldm_patched/modules/model_management.py", line 615, in cast_to_device
return tensor.to(device, copy=copy, non_blocking=non_blocking).to(dtype, non_blocking=non_blocking)
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 26.00 MiB. GPU 0 has a total capacity of 7.93 GiB of which 17.62 MiB is free. Including non-PyTorch memory, this process has 7.91 GiB memory in use. Of the allocated memory 7.53 GiB is allocated by PyTorch, and 306.79 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)
Total time: 16.25 seconds
nvidia smi shows this
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.147.05 Driver Version: 525.147.05 CUDA Version: 12.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla M60 Off | 00000000:0B:00.0 Off | Off |
| N/A 43C P0 39W / 150W | 8105MiB / 8192MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | 0 N/A N/A 1658 C python 8102MiB | +-----------------------------------------------------------------------------+
but i still dont understand why it works on my 2080 maxq because it only utilizes about 6785 mib. but it runs just fine. here's that report +---------------------------------------------------------------------------------------+ | NVIDIA-SMI 537.13 Driver Version: 537.13 CUDA Version: 12.2 | |-----------------------------------------+----------------------+----------------------+ | GPU Name TCC/WDDM | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+======================+======================| | 0 NVIDIA GeForce RTX 2080 ... WDDM | 00000000:01:00.0 On | N/A | | N/A 52C P0 81W / 80W | 6785MiB / 8192MiB | 98% Default | | | | N/A | +-----------------------------------------+----------------------+----------------------+
Already up-to-date Update succeeded. [System ARGV] ['Fooocus\entry_with_update.py'] Python 3.10.9 (tags/v3.10.9:1dd9be6, Dec 6 2022, 20:01:21) [MSC v.1934 64 bit (AMD64)] Fooocus version: 2.3.1 [Cleanup] Attempting to delete content of temp dir C:\Users\hemle\AppData\Local\Temp\fooocus [Cleanup] Cleanup successful Total VRAM 8192 MB, total RAM 65397 MB Set vram state to: NORMAL_VRAM Always offload VRAM Device: cuda:0 NVIDIA GeForce RTX 2080 with Max-Q Design : native VAE dtype: torch.float32 Using pytorch cross attention Refiner unloaded. Running on local URL: http://127.0.0.1:7865
To create a public link, set share=True
in launch()
.
model_type EPS
UNet ADM Dimension 2816
Using pytorch attention in VAE
Working with z of shape (1, 4, 32, 32) = 4096 dimensions.
Using pytorch attention in VAE
extra {'cond_stage_model.clip_g.transformer.text_model.embeddings.position_ids', 'cond_stage_model.clip_l.logit_scale', 'cond_stage_model.clip_l.text_projection'}
Base model loaded: C:\Users\hemle\Downloads\Fooocus_win64_2-1-831\Fooocus\models\checkpoints\juggernautXL_v8Rundiffusion.safetensors
Request to load LoRAs [['sd_xl_offset_example-lora_1.0.safetensors', 0.1], ['None', 1.0], ['None', 1.0], ['None', 1.0], ['None', 1.0]] for model [C:\Users\hemle\Downloads\Fooocus_win64_2-1-831\Fooocus\models\checkpoints\juggernautXL_v8Rundiffusion.safetensors].
Loaded LoRA [C:\Users\hemle\Downloads\Fooocus_win64_2-1-831\Fooocus\models\loras\sd_xl_offset_example-lora_1.0.safetensors] for UNet [C:\Users\hemle\Downloads\Fooocus_win64_2-1-831\Fooocus\models\checkpoints\juggernautXL_v8Rundiffusion.safetensors] with 788 keys at weight 0.1.
Fooocus V2 Expansion: Vocab with 642 words.
Fooocus Expansion engine loaded for cuda:0, use_fp16 = True.
Requested to load SDXLClipModel
Requested to load GPT2LMHeadModel
Loading 2 new models
[Fooocus Model Management] Moving model(s) has taken 0.62 seconds
Started worker with PID 12820
App started successful. Use the app with http://127.0.0.1:7865/ or 127.0.0.1:7865
[Parameters] Adaptive CFG = 7
[Parameters] Sharpness = 2
[Parameters] ControlNet Softness = 0.25
[Parameters] ADM Scale = 1.5 : 0.8 : 0.3
[Parameters] CFG = 4.0
[Parameters] Seed = 3296201712917260942
[Parameters] Sampler = dpmpp_2m_sde_gpu - karras
[Parameters] Steps = 30 - 15
[Fooocus] Initializing ...
[Fooocus] Loading models ...
Refiner unloaded.
[Fooocus] Processing prompts ...
[Fooocus] Preparing Fooocus text #1 ...
[Prompt Expansion] make a car, cinematic, dynamic, dramatic ambient light, detailed, intricate, elegant, highly saturated colors, strong, epic, stunning, heroic, amazing detail, creative, positive, attractive, cute, beautiful, confident, inspired, pretty, perfect, coherent, trendy, best, awesome, futuristic, cool, inspirational, vibrant, loving, full, color, complex
[Fooocus] Preparing Fooocus text #2 ...
[Prompt Expansion] make a car, colorful, vivid, detailed, breathtaking, beautiful, emotional, shiny, shining, highly detail, amazing, flowing, light, complex, color, surreal, ambient, pristine, dynamic, symmetry, sharp focus, epic, fine, very strong, winning, perfect, artistic, innocent, confident, attractive, incredible, creative, positive, unique, loving
[Fooocus] Encoding positive #1 ...
[Fooocus Model Management] Moving model(s) has taken 0.15 seconds
[Fooocus] Encoding positive #2 ...
[Fooocus] Encoding negative #1 ...
[Fooocus] Encoding negative #2 ...
[Parameters] Denoising Strength = 1.0
[Parameters] Initial Latent shape: Image Space (896, 1152)
Preparation time: 3.90 seconds
[Sampler] refiner_swap_method = joint
[Sampler] sigma_min = 0.0291671771556139, sigma_max = 14.614643096923828
Requested to load SDXL
Loading 1 new model
[Fooocus Model Management] Moving model(s) has taken 2.06 seconds
100%|██████████████████████████████████████████████████████████████████████████████████| 30/30 [00:23<00:00, 1.30it/s]
Requested to load AutoencoderKL
Loading 1 new model
[Fooocus Model Management] Moving model(s) has taken 0.20 seconds
Image generated with private log at: C:\Users\hemle\Downloads\Fooocus_win64_2-1-831\Fooocus\outputs\2024-03-31\log.html
Generating and saving time: 27.49 seconds
[Sampler] refiner_swap_method = joint
[Sampler] sigma_min = 0.0291671771556139, sigma_max = 14.614643096923828
Requested to load SDXL
Loading 1 new model
[Fooocus Model Management] Moving model(s) has taken 1.37 seconds
100%|██████████████████████████████████████████████████████████████████████████████████| 30/30 [00:23<00:00, 1.30it/s]
Requested to load AutoencoderKL
Loading 1 new model
[Fooocus Model Management] Moving model(s) has taken 0.18 seconds
Image generated with private log at: C:\Users\hemle\Downloads\Fooocus_win64_2-1-831\Fooocus\outputs\2024-03-31\log.html
Generating and saving time: 26.77 seconds
Requested to load SDXLClipModel
Requested to load GPT2LMHeadModel
Loading 2 new models
Total time: 63.45 seconds
[Fooocus Model Management] Moving model(s) has taken 0.59 seconds
[Parameters] Adaptive CFG = 7
[Parameters] Sharpness = 2
[Parameters] ControlNet Softness = 0.25
[Parameters] ADM Scale = 1.5 : 0.8 : 0.3
[Parameters] CFG = 4.0
[Parameters] Seed = 208600173302938237
[Parameters] Sampler = dpmpp_2m_sde_gpu - karras
[Parameters] Steps = 30 - 15
[Fooocus] Initializing ...
[Fooocus] Loading models ...
Refiner unloaded.
[Fooocus] Processing prompts ...
[Fooocus] Preparing Fooocus text #1 ...
[Prompt Expansion] make a car, cool color, perfect shiny deep background, sharp focus, intricate, elegant, highly detailed, dramatic light, professional still, dynamic composition, ambient atmosphere, vivid colors, beautiful, epic, stunning, creative, cinematic, fine detail, full clear, great quality, attractive, cheerful, novel, romantic, scenic, rich, hopeful, cute, radiant, colorful
[Fooocus] Preparing Fooocus text #2 ...
[Prompt Expansion] make a car, colorful, shiny, vivid, detailed, amazing, flowing, infinite, light, color, epic, atmosphere, new, dynamic, ambient, cinematic, elegant, intricate, highly focused, creative, pure, artistic, romantic, sunny, beautiful, deep, unique, vibrant, coherent, colors, perfect, illuminated, pretty, clear, shining, flawless
[Fooocus] Encoding positive #1 ...
[Fooocus Model Management] Moving model(s) has taken 0.13 seconds
[Fooocus] Encoding positive #2 ...
[Fooocus] Encoding negative #1 ...
[Fooocus] Encoding negative #2 ...
[Parameters] Denoising Strength = 1.0
[Parameters] Initial Latent shape: Image Space (896, 1152)
Preparation time: 3.24 seconds
[Sampler] refiner_swap_method = joint
[Sampler] sigma_min = 0.0291671771556139, sigma_max = 14.614643096923828
Requested to load SDXL
Loading 1 new model
[Fooocus Model Management] Moving model(s) has taken 2.20 seconds
100%|██████████████████████████████████████████████████████████████████████████████████| 30/30 [00:22<00:00, 1.33it/s]
Requested to load AutoencoderKL
Loading 1 new model
[Fooocus Model Management] Moving model(s) has taken 0.23 seconds
Image generated with private log at: C:\Users\hemle\Downloads\Fooocus_win64_2-1-831\Fooocus\outputs\2024-03-31\log.html
Generating and saving time: 27.06 seconds
[Sampler] refiner_swap_method = joint
[Sampler] sigma_min = 0.0291671771556139, sigma_max = 14.614643096923828
Requested to load SDXL
Loading 1 new model
[Fooocus Model Management] Moving model(s) has taken 1.36 seconds
100%|██████████████████████████████████████████████████████████████████████████████████| 30/30 [00:23<00:00, 1.30it/s]
Requested to load AutoencoderKL
Loading 1 new model
[Fooocus Model Management] Moving model(s) has taken 0.18 seconds
Image generated with private log at: C:\Users\hemle\Downloads\Fooocus_win64_2-1-831\Fooocus\outputs\2024-03-31\log.html
Generating and saving time: 26.80 seconds
Requested to load SDXLClipModel
Requested to load GPT2LMHeadModel
Loading 2 new models
Total time: 57.17 seconds
[Fooocus Model Management] Moving model(s) has taken 0.58 seconds
It looks like i need to some how shave 100 mib of vram off the program is there anyway to run the gppt2 part on CPU?
In general yes, but please check first if it works with disabled Fooocus V2 style
ok ill try that
i disabled Fooocus V2 style but still same error occured
from my testing i belive with the tesla drivers for the m60 and p4 it limits the max vram to 8094 Mib instead of 8192
i disabled Fooocus V2 style but still same error occured
mashb1t commented Mar 31, 2024
So this issue can be closed as this is a driver issue with your cards?
is there a way to change the gpt 2 model to run on CPU or another gpu to limit vram? Also it could still be a bug I am not sure because the behavior is odd. It runs on my 2080 maxq without ever filling the GPU to more than 7 gib but on the teslas it initial tries to fill the vram to 8 gib which fails as I belive they are limited to 8100 mib
You can force it to be on CPU by setting https://github.com/lllyasviel/Fooocus/blob/e2f9bcb11d06216d6800676c48d8d74d6fd77a4b/extras/expansion.py#L65 to torch.device("cpu")
or add a line in https://github.com/lllyasviel/Fooocus/blob/e2f9bcb11d06216d6800676c48d8d74d6fd77a4b/ldm_patched/modules/model_management.py#L526-L537 to always return torch.device("cpu")
But keep in mind that prompt expansion is only used when setting style Fooocus V2, so this might not be the right place to begin with.
sed when setting style Fooocus V2, so this might
this would only make the text model run on CPU not the image model correct?
yes
I tried that, but it didn't really work. Now that you are aware of this problem, is it possible that there are any plans in the future to try and trim the VRAM requirements by about 200 mib to allow them to run on Tesla 8 GB GPus?
No plans for in-depth testing on P4 and M60 cards, works on 4GB VRAM and must be an issue with your driver reporting wrong numbers.
yeah that sucks. But i did just buy a tesla m40 which has 24gb vram so hopefully that works. Last question: are there any possibilities of adding multi-GPU support like what Ollama has?
See https://github.com/lllyasviel/Fooocus/discussions/2292
What you can do is to start multiple instances of Fooocus instead.
This is a weird driver setting issue with P4s. By default, it runs ECC memory. Disable the ECC RAM with "nvidia-smi -e 0". That should release the full 8GB of VRAM.
Checklist
What happened?
I have a tesla m60 and p4 running in a linux vm (same problem occured on windows) ive tried running them but it always runs in low vram mode.
Steps to reproduce the problem
run conda activate fooocus python entry_with_update.py --listen
What should have happened?
I think it shouldnt run in low vram mode(correct me if im wrong) it runs just fine on my 2080maxq but has these lowvram problems on the tesla cards i have tested with.
What browsers do you use to access Fooocus?
Mozilla Firefox
Where are you running Fooocus?
Locally with virtualization (e.g. Docker)
What operating system are you using?
ubuntu20.4 and windows 10
Console logs
Additional information
current version +-----------------------------------------------------------------------------+ | NVIDIA-SMI 525.147.05 Driver Version: 525.147.05 CUDA Version: 12.0 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 Tesla P4 Off | 00000000:0B:00.0 Off | Off | | N/A 75C P0 41W / 75W | 6458MiB / 8192MiB | 47% Default | | | | N/A | +-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | 0 N/A N/A 2184 C python 6456MiB | +---------------------------------------------------------------------------- i have also tried 550