lllyasviel / Fooocus

Focus on prompting and generating
GNU General Public License v3.0
40k stars 5.52k forks source link

Tesla p4 and m60 forced into low vram mode [Bug]: #2661

Closed jhemley closed 5 months ago

jhemley commented 5 months ago

Checklist

What happened?

I have a tesla m60 and p4 running in a linux vm (same problem occured on windows) ive tried running them but it always runs in low vram mode.

Steps to reproduce the problem

run conda activate fooocus python entry_with_update.py --listen

What should have happened?

I think it shouldnt run in low vram mode(correct me if im wrong) it runs just fine on my 2080maxq but has these lowvram problems on the tesla cards i have tested with.

What browsers do you use to access Fooocus?

Mozilla Firefox

Where are you running Fooocus?

Locally with virtualization (e.g. Docker)

What operating system are you using?

ubuntu20.4 and windows 10

Console logs

python entry_with_update.py --always-normal-vram --listen
Already up-to-date
Update succeeded.
[System ARGV] ['entry_with_update.py', '--always-normal-vram', '--listen']
Python 3.10.14 (main, Mar 21 2024, 16:24:04) [GCC 11.2.0]
Fooocus version: 2.3.1
[Cleanup] Attempting to delete content of temp dir /tmp/fooocus
[Cleanup] Cleanup successful
Total VRAM 8116 MB, total RAM 64308 MB
Set vram state to: NORMAL_VRAM
Always offload VRAM
Device: cuda:0 Tesla P4 : native
VAE dtype: torch.float32
Using pytorch cross attention
Refiner unloaded.
Running on local URL:  http://0.0.0.0:7865

To create a public link, set `share=True` in `launch()`.
model_type EPS
UNet ADM Dimension 2816
Using pytorch attention in VAE
Working with z of shape (1, 4, 32, 32) = 4096 dimensions.
Using pytorch attention in VAE
extra {'cond_stage_model.clip_l.text_projection', 'cond_stage_model.clip_l.logit_scale', 'cond_stage_model.clip_g.transformer.text_model.embeddings.position_ids'}
Base model loaded: /home/jared/Fooocus/models/checkpoints/juggernautXL_v8Rundiffusion.safetensors
Request to load LoRAs [['sd_xl_offset_example-lora_1.0.safetensors', 0.1], ['None', 1.0], ['None', 1.0], ['None', 1.0], ['None', 1.0]] for model [/home/jared/Fooocus/models/checkpoints/juggernautXL_v8Rundiffusion.safetensors].
Loaded LoRA [/home/jared/Fooocus/models/loras/sd_xl_offset_example-lora_1.0.safetensors] for UNet [/home/jared/Fooocus/models/checkpoints/juggernautXL_v8Rundiffusion.safetensors] with 788 keys at weight 0.1.
Fooocus V2 Expansion: Vocab with 642 words.
Fooocus Expansion engine loaded for cpu, use_fp16 = False.
Requested to load SDXLClipModel
Requested to load GPT2LMHeadModel
Loading 2 new models
Started worker with PID 2184
App started successful. Use the app with http://localhost:7865/ or 0.0.0.0:7865
[Parameters] Adaptive CFG = 7
[Parameters] Sharpness = 2
[Parameters] ControlNet Softness = 0.25
[Parameters] ADM Scale = 1.5 : 0.8 : 0.3
[Parameters] CFG = 4.0
[Parameters] Seed = 59858353226061117
[Parameters] Sampler = dpmpp_2m_sde_gpu - karras
[Parameters] Steps = 30 - 15
[Fooocus] Initializing ...
[Fooocus] Loading models ...
Refiner unloaded.
[Fooocus] Processing prompts ...
[Fooocus] Preparing Fooocus text #1 ...
[Prompt Expansion] make a picture of a lambo, cinematic, phenomenal, creative, dynamic, dramatic, thought, epic, elegant, intricate, detailed, extremely light, shining, complimentary colors, shiny, glowing, winning, grand elaborate complex, highly decorated, open flowing, deep color, very beautiful, symmetry, great composition, atmosphere, perfect, artistic, innocent, inspiring, unique
[Fooocus] Preparing Fooocus text #2 ...
[Prompt Expansion] make a picture of a lambo, detailed, elegant, holy, impressive, noble, gorgeous, amazing, fancy, dramatic, colorful, very inspirational, beautiful, illuminated background, epic composition, magical atmosphere, cinematic, symmetry, pure, solid colors, extremely, highly complex, determined, imposing, futuristic, professional, artistic, creative, vibrant, fine detail, color
[Fooocus] Encoding positive #1 ...
[Fooocus] Encoding positive #2 ...
[Fooocus] Encoding negative #1 ...
[Fooocus] Encoding negative #2 ...
[Parameters] Denoising Strength = 1.0
[Parameters] Initial Latent shape: Image Space (896, 1152)
Preparation time: 9.55 seconds
[Sampler] refiner_swap_method = joint
[Sampler] sigma_min = 0.0291671771556139, sigma_max = 14.614643096923828
Requested to load SDXL
Loading 1 new model
loading in lowvram mode 5363.8427734375
[Fooocus Model Management] Moving model(s) has taken 9.02 seconds
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 30/30 [04:40<00:00,  9.35s/it]
Requested to load AutoencoderKL
Loading 1 new model
[Fooocus Model Management] Moving model(s) has taken 1.39 seconds
Image generated with private log at: /home/jared/Fooocus/outputs/2024-03-29/log.html
Generating and saving time: 294.14 seconds
[Sampler] refiner_swap_method = joint
[Sampler] sigma_min = 0.0291671771556139, sigma_max = 14.614643096923828
Requested to load SDXL
Loading 1 new model
loading in lowvram mode 5331.772085189819
[Fooocus Model Management] Moving model(s) has taken 8.64 seconds
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 30/30 [04:44<00:00,  9.49s/it]
Requested to load AutoencoderKL
Loading 1 new model
[Fooocus Model Management] Moving model(s) has taken 1.33 seconds
Image generated with private log at: /home/jared/Fooocus/outputs/2024-03-29/log.html
Generating and saving time: 297.74 seconds
Total time: 601.53 seconds

Additional information

current version +-----------------------------------------------------------------------------+ | NVIDIA-SMI 525.147.05 Driver Version: 525.147.05 CUDA Version: 12.0 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 Tesla P4 Off | 00000000:0B:00.0 Off | Off | | N/A 75C P0 41W / 75W | 6458MiB / 8192MiB | 47% Default | | | | N/A | +-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | 0 N/A N/A 2184 C python 6456MiB | +---------------------------------------------------------------------------- i have also tried 550

mashb1t commented 5 months ago

As you can see in https://github.com/lllyasviel/Fooocus/blob/main/ldm_patched/modules/model_management.py#L429-L430 the trigger for lowvram mode is model_size > (current_free_mem - inference_memory). Please check the model size and debug the other parameters in given code by adding a breakpoint and using python debugger or by prompting the values to further debug. Thanks!

jhemley commented 5 months ago

Is there anyway to simply force the model because i belive i have enough vram. I tried the force normal and high vram but they didnt work.

mashb1t commented 5 months ago

Please check the model size and debug the other parameters in given code by adding a breakpoint and using python debugger or by prompting the values to further debug. Thanks!

Please debug this yourself and provide further information.

jhemley commented 5 months ago

I think i located the problem. i think the telsa driver limits the vram usage to 8102 MiB instead of the 8192 on the card. I found this by disabling the lowvram mode by changing the param to this model_size > (99999999999999). now it outputs this python entry_with_update.py --listen Already up-to-date Update succeeded. [System ARGV] ['entry_with_update.py', '--listen'] Python 3.10.14 (main, Mar 21 2024, 16:24:04) [GCC 11.2.0] Fooocus version: 2.3.1 [Cleanup] Attempting to delete content of temp dir /tmp/fooocus [Cleanup] Cleanup successful Total VRAM 8123 MB, total RAM 32100 MB Set vram state to: NORMAL_VRAM Always offload VRAM Device: cuda:0 Tesla M60 : native VAE dtype: torch.float32 Using pytorch cross attention Refiner unloaded. Running on local URL: http://0.0.0.0:7865

Thanks for being a Gradio user! If you have questions or feedback, please join our Discord server and chat with us: https://discord.gg/feTf9x3ZSB

To create a public link, set share=True in launch(). model_type EPS UNet ADM Dimension 2816 Using pytorch attention in VAE Working with z of shape (1, 4, 32, 32) = 4096 dimensions. Using pytorch attention in VAE extra {'cond_stage_model.clip_l.text_projection', 'cond_stage_model.clip_g.transformer.text_model.embeddings.position_ids', 'cond_stage_model.clip_l.logit_scale'} Base model loaded: /home/jared/Fooocus/models/checkpoints/juggernautXL_v8Rundiffusion.safetensors Request to load LoRAs [['sd_xl_offset_example-lora_1.0.safetensors', 0.1], ['None', 1.0], ['None', 1.0], ['None', 1.0], ['None', 1.0]] for model [/home/jared/Fooocus/models/checkpoints/juggernautXL_v8Rundiffusion.safetensors]. Loaded LoRA [/home/jared/Fooocus/models/loras/sd_xl_offset_example-lora_1.0.safetensors] for UNet [/home/jared/Fooocus/models/checkpoints/juggernautXL_v8Rundiffusion.safetensors] with 788 keys at weight 0.1. Fooocus V2 Expansion: Vocab with 642 words. Fooocus Expansion engine loaded for cpu, use_fp16 = False. Requested to load SDXLClipModel Requested to load GPT2LMHeadModel Loading 2 new models Started worker with PID 1658 App started successful. Use the app with http://localhost:7865/ or 0.0.0.0:7865 [Parameters] Adaptive CFG = 7 [Parameters] Sharpness = 2 [Parameters] ControlNet Softness = 0.25 [Parameters] ADM Scale = 1.5 : 0.8 : 0.3 [Parameters] CFG = 4.0 [Parameters] Seed = 291429156536229784 [Parameters] Sampler = dpmpp_2m_sde_gpu - karras [Parameters] Steps = 30 - 15 [Fooocus] Initializing ... [Fooocus] Loading models ... Refiner unloaded. [Fooocus] Processing prompts ... [Fooocus] Preparing Fooocus text #1 ... [Prompt Expansion] make a car, bright colors, elegant, highly detailed, sharp focus, beautiful, intricate, cinematic, new classic, sunny, shining, deep aesthetic, appealing, artistic, fine detail, awesome color, dynamic light, great composition, clear professional background, creative, innocent, scenic, positive, unique, attractive, cute, perfect, focused, vibrant, epic, best [Fooocus] Preparing Fooocus text #2 ... [Prompt Expansion] make a car, expressive, dynamic composition, dramatic, elegant, highly detailed, sharp focus, beautiful, perfect light, attractive, innocent, divine, sublime, epic, stunning, inspired, vibrant, intricate, brilliant, thought, cinematic, background, illuminated, professional, best, creative, winning, romantic, fantastic, scenic, artistic, fabulous, bright, hopeful, cute [Fooocus] Encoding positive #1 ... [Fooocus] Encoding positive #2 ... [Fooocus] Encoding negative #1 ... [Fooocus] Encoding negative #2 ... [Parameters] Denoising Strength = 1.0 [Parameters] Initial Latent shape: Image Space (896, 1152) Preparation time: 12.90 seconds [Sampler] refiner_swap_method = joint [Sampler] sigma_min = 0.0291671771556139, sigma_max = 14.614643096923828 Requested to load SDXL Loading 1 new model ERROR diffusion_model.output_blocks.1.1.transformer_blocks.9.ff.net.0.proj.weight CUDA out of memory. Tried to allocate 50.00 MiB. GPU 0 has a total capacity of 7.93 GiB of which 17.62 MiB is free. Including non-PyTorch memory, this process has 7.91 GiB memory in use. Of the allocated memory 7.53 GiB is allocated by PyTorch, and 306.44 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables) Traceback (most recent call last): File "/home/jared/Fooocus/modules/async_worker.py", line 913, in worker handler(task) File "/home/jared/anaconda3/envs/fooocus/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, kwargs) File "/home/jared/anaconda3/envs/fooocus/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, *kwargs) File "/home/jared/Fooocus/modules/async_worker.py", line 816, in handler imgs = pipeline.process_diffusion( File "/home/jared/anaconda3/envs/fooocus/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(args, kwargs) File "/home/jared/anaconda3/envs/fooocus/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, kwargs) File "/home/jared/Fooocus/modules/default_pipeline.py", line 362, in process_diffusion sampled_latent = core.ksampler( File "/home/jared/anaconda3/envs/fooocus/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, *kwargs) File "/home/jared/anaconda3/envs/fooocus/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(args, kwargs) File "/home/jared/Fooocus/modules/core.py", line 308, in ksampler samples = ldm_patched.modules.sample.sample(model, File "/home/jared/Fooocus/ldm_patched/modules/sample.py", line 93, in sample real_model, positive_copy, negative_copy, noise_mask, models = prepare_sampling(model, noise.shape, positive, negative, noise_mask) File "/home/jared/Fooocus/ldm_patched/modules/sample.py", line 86, in prepare_sampling ldm_patched.modules.model_management.load_models_gpu([model] + models, model.memory_required([noise_shape[0] 2] + list(noise_shape[1:])) + inference_memory) File "/home/jared/Fooocus/modules/patch.py", line 447, in patched_load_models_gpu y = ldm_patched.modules.model_management.load_models_gpu_origin(args, **kwargs) File "/home/jared/Fooocus/ldm_patched/modules/model_management.py", line 437, in load_models_gpu cur_loaded_model = loaded_model.model_load(lowvram_model_memory) File "/home/jared/Fooocus/ldm_patched/modules/model_management.py", line 304, in model_load raise e File "/home/jared/Fooocus/ldm_patched/modules/model_management.py", line 300, in model_load self.real_model = self.model.patch_model(device_to=patch_model_to) #TODO: do something with loras and offloading to CPU File "/home/jared/Fooocus/ldm_patched/modules/model_patcher.py", line 199, in patch_model temp_weight = ldm_patched.modules.model_management.cast_to_device(weight, device_to, torch.float32, copy=True) File "/home/jared/Fooocus/ldm_patched/modules/model_management.py", line 615, in cast_to_device return tensor.to(device, copy=copy, non_blocking=non_blocking).to(dtype, non_blocking=non_blocking) torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 26.00 MiB. GPU 0 has a total capacity of 7.93 GiB of which 17.62 MiB is free. Including non-PyTorch memory, this process has 7.91 GiB memory in use. Of the allocated memory 7.53 GiB is allocated by PyTorch, and 306.79 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables) Total time: 16.25 seconds nvidia smi shows this +-----------------------------------------------------------------------------+ | NVIDIA-SMI 525.147.05 Driver Version: 525.147.05 CUDA Version: 12.0 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 Tesla M60 Off | 00000000:0B:00.0 Off | Off | | N/A 43C P0 39W / 150W | 8105MiB / 8192MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | 0 N/A N/A 1658 C python 8102MiB | +-----------------------------------------------------------------------------+

jhemley commented 5 months ago

but i still dont understand why it works on my 2080 maxq because it only utilizes about 6785 mib. but it runs just fine. here's that report +---------------------------------------------------------------------------------------+ | NVIDIA-SMI 537.13 Driver Version: 537.13 CUDA Version: 12.2 | |-----------------------------------------+----------------------+----------------------+ | GPU Name TCC/WDDM | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+======================+======================| | 0 NVIDIA GeForce RTX 2080 ... WDDM | 00000000:01:00.0 On | N/A | | N/A 52C P0 81W / 80W | 6785MiB / 8192MiB | 98% Default | | | | N/A | +-----------------------------------------+----------------------+----------------------+

Already up-to-date Update succeeded. [System ARGV] ['Fooocus\entry_with_update.py'] Python 3.10.9 (tags/v3.10.9:1dd9be6, Dec 6 2022, 20:01:21) [MSC v.1934 64 bit (AMD64)] Fooocus version: 2.3.1 [Cleanup] Attempting to delete content of temp dir C:\Users\hemle\AppData\Local\Temp\fooocus [Cleanup] Cleanup successful Total VRAM 8192 MB, total RAM 65397 MB Set vram state to: NORMAL_VRAM Always offload VRAM Device: cuda:0 NVIDIA GeForce RTX 2080 with Max-Q Design : native VAE dtype: torch.float32 Using pytorch cross attention Refiner unloaded. Running on local URL: http://127.0.0.1:7865

To create a public link, set share=True in launch(). model_type EPS UNet ADM Dimension 2816 Using pytorch attention in VAE Working with z of shape (1, 4, 32, 32) = 4096 dimensions. Using pytorch attention in VAE extra {'cond_stage_model.clip_g.transformer.text_model.embeddings.position_ids', 'cond_stage_model.clip_l.logit_scale', 'cond_stage_model.clip_l.text_projection'} Base model loaded: C:\Users\hemle\Downloads\Fooocus_win64_2-1-831\Fooocus\models\checkpoints\juggernautXL_v8Rundiffusion.safetensors Request to load LoRAs [['sd_xl_offset_example-lora_1.0.safetensors', 0.1], ['None', 1.0], ['None', 1.0], ['None', 1.0], ['None', 1.0]] for model [C:\Users\hemle\Downloads\Fooocus_win64_2-1-831\Fooocus\models\checkpoints\juggernautXL_v8Rundiffusion.safetensors]. Loaded LoRA [C:\Users\hemle\Downloads\Fooocus_win64_2-1-831\Fooocus\models\loras\sd_xl_offset_example-lora_1.0.safetensors] for UNet [C:\Users\hemle\Downloads\Fooocus_win64_2-1-831\Fooocus\models\checkpoints\juggernautXL_v8Rundiffusion.safetensors] with 788 keys at weight 0.1. Fooocus V2 Expansion: Vocab with 642 words. Fooocus Expansion engine loaded for cuda:0, use_fp16 = True. Requested to load SDXLClipModel Requested to load GPT2LMHeadModel Loading 2 new models [Fooocus Model Management] Moving model(s) has taken 0.62 seconds Started worker with PID 12820 App started successful. Use the app with http://127.0.0.1:7865/ or 127.0.0.1:7865 [Parameters] Adaptive CFG = 7 [Parameters] Sharpness = 2 [Parameters] ControlNet Softness = 0.25 [Parameters] ADM Scale = 1.5 : 0.8 : 0.3 [Parameters] CFG = 4.0 [Parameters] Seed = 3296201712917260942 [Parameters] Sampler = dpmpp_2m_sde_gpu - karras [Parameters] Steps = 30 - 15 [Fooocus] Initializing ... [Fooocus] Loading models ... Refiner unloaded. [Fooocus] Processing prompts ... [Fooocus] Preparing Fooocus text #1 ... [Prompt Expansion] make a car, cinematic, dynamic, dramatic ambient light, detailed, intricate, elegant, highly saturated colors, strong, epic, stunning, heroic, amazing detail, creative, positive, attractive, cute, beautiful, confident, inspired, pretty, perfect, coherent, trendy, best, awesome, futuristic, cool, inspirational, vibrant, loving, full, color, complex [Fooocus] Preparing Fooocus text #2 ... [Prompt Expansion] make a car, colorful, vivid, detailed, breathtaking, beautiful, emotional, shiny, shining, highly detail, amazing, flowing, light, complex, color, surreal, ambient, pristine, dynamic, symmetry, sharp focus, epic, fine, very strong, winning, perfect, artistic, innocent, confident, attractive, incredible, creative, positive, unique, loving [Fooocus] Encoding positive #1 ... [Fooocus Model Management] Moving model(s) has taken 0.15 seconds [Fooocus] Encoding positive #2 ... [Fooocus] Encoding negative #1 ... [Fooocus] Encoding negative #2 ... [Parameters] Denoising Strength = 1.0 [Parameters] Initial Latent shape: Image Space (896, 1152) Preparation time: 3.90 seconds [Sampler] refiner_swap_method = joint [Sampler] sigma_min = 0.0291671771556139, sigma_max = 14.614643096923828 Requested to load SDXL Loading 1 new model [Fooocus Model Management] Moving model(s) has taken 2.06 seconds 100%|██████████████████████████████████████████████████████████████████████████████████| 30/30 [00:23<00:00, 1.30it/s] Requested to load AutoencoderKL Loading 1 new model [Fooocus Model Management] Moving model(s) has taken 0.20 seconds Image generated with private log at: C:\Users\hemle\Downloads\Fooocus_win64_2-1-831\Fooocus\outputs\2024-03-31\log.html Generating and saving time: 27.49 seconds [Sampler] refiner_swap_method = joint [Sampler] sigma_min = 0.0291671771556139, sigma_max = 14.614643096923828 Requested to load SDXL Loading 1 new model [Fooocus Model Management] Moving model(s) has taken 1.37 seconds 100%|██████████████████████████████████████████████████████████████████████████████████| 30/30 [00:23<00:00, 1.30it/s] Requested to load AutoencoderKL Loading 1 new model [Fooocus Model Management] Moving model(s) has taken 0.18 seconds Image generated with private log at: C:\Users\hemle\Downloads\Fooocus_win64_2-1-831\Fooocus\outputs\2024-03-31\log.html Generating and saving time: 26.77 seconds Requested to load SDXLClipModel Requested to load GPT2LMHeadModel Loading 2 new models Total time: 63.45 seconds [Fooocus Model Management] Moving model(s) has taken 0.59 seconds [Parameters] Adaptive CFG = 7 [Parameters] Sharpness = 2 [Parameters] ControlNet Softness = 0.25 [Parameters] ADM Scale = 1.5 : 0.8 : 0.3 [Parameters] CFG = 4.0 [Parameters] Seed = 208600173302938237 [Parameters] Sampler = dpmpp_2m_sde_gpu - karras [Parameters] Steps = 30 - 15 [Fooocus] Initializing ... [Fooocus] Loading models ... Refiner unloaded. [Fooocus] Processing prompts ... [Fooocus] Preparing Fooocus text #1 ... [Prompt Expansion] make a car, cool color, perfect shiny deep background, sharp focus, intricate, elegant, highly detailed, dramatic light, professional still, dynamic composition, ambient atmosphere, vivid colors, beautiful, epic, stunning, creative, cinematic, fine detail, full clear, great quality, attractive, cheerful, novel, romantic, scenic, rich, hopeful, cute, radiant, colorful [Fooocus] Preparing Fooocus text #2 ... [Prompt Expansion] make a car, colorful, shiny, vivid, detailed, amazing, flowing, infinite, light, color, epic, atmosphere, new, dynamic, ambient, cinematic, elegant, intricate, highly focused, creative, pure, artistic, romantic, sunny, beautiful, deep, unique, vibrant, coherent, colors, perfect, illuminated, pretty, clear, shining, flawless [Fooocus] Encoding positive #1 ... [Fooocus Model Management] Moving model(s) has taken 0.13 seconds [Fooocus] Encoding positive #2 ... [Fooocus] Encoding negative #1 ... [Fooocus] Encoding negative #2 ... [Parameters] Denoising Strength = 1.0 [Parameters] Initial Latent shape: Image Space (896, 1152) Preparation time: 3.24 seconds [Sampler] refiner_swap_method = joint [Sampler] sigma_min = 0.0291671771556139, sigma_max = 14.614643096923828 Requested to load SDXL Loading 1 new model [Fooocus Model Management] Moving model(s) has taken 2.20 seconds 100%|██████████████████████████████████████████████████████████████████████████████████| 30/30 [00:22<00:00, 1.33it/s] Requested to load AutoencoderKL Loading 1 new model [Fooocus Model Management] Moving model(s) has taken 0.23 seconds Image generated with private log at: C:\Users\hemle\Downloads\Fooocus_win64_2-1-831\Fooocus\outputs\2024-03-31\log.html Generating and saving time: 27.06 seconds [Sampler] refiner_swap_method = joint [Sampler] sigma_min = 0.0291671771556139, sigma_max = 14.614643096923828 Requested to load SDXL Loading 1 new model [Fooocus Model Management] Moving model(s) has taken 1.36 seconds 100%|██████████████████████████████████████████████████████████████████████████████████| 30/30 [00:23<00:00, 1.30it/s] Requested to load AutoencoderKL Loading 1 new model [Fooocus Model Management] Moving model(s) has taken 0.18 seconds Image generated with private log at: C:\Users\hemle\Downloads\Fooocus_win64_2-1-831\Fooocus\outputs\2024-03-31\log.html Generating and saving time: 26.80 seconds Requested to load SDXLClipModel Requested to load GPT2LMHeadModel Loading 2 new models Total time: 57.17 seconds [Fooocus Model Management] Moving model(s) has taken 0.58 seconds

jhemley commented 5 months ago

It looks like i need to some how shave 100 mib of vram off the program is there anyway to run the gppt2 part on CPU?

mashb1t commented 5 months ago

In general yes, but please check first if it works with disabled Fooocus V2 style

jhemley commented 5 months ago

ok ill try that

jhemley commented 5 months ago

i disabled Fooocus V2 style but still same error occured

jhemley commented 5 months ago

from my testing i belive with the tesla drivers for the m60 and p4 it limits the max vram to 8094 Mib instead of 8192

jhemley commented 5 months ago

i disabled Fooocus V2 style but still same error occured

mashb1t commented Mar 31, 2024

mashb1t commented 5 months ago

So this issue can be closed as this is a driver issue with your cards?

jhemley commented 5 months ago

is there a way to change the gpt 2 model to run on CPU or another gpu to limit vram? Also it could still be a bug I am not sure because the behavior is odd. It runs on my 2080 maxq without ever filling the GPU to more than 7 gib but on the teslas it initial tries to fill the vram to 8 gib which fails as I belive they are limited to 8100 mib

mashb1t commented 5 months ago

You can force it to be on CPU by setting https://github.com/lllyasviel/Fooocus/blob/e2f9bcb11d06216d6800676c48d8d74d6fd77a4b/extras/expansion.py#L65 to torch.device("cpu") or add a line in https://github.com/lllyasviel/Fooocus/blob/e2f9bcb11d06216d6800676c48d8d74d6fd77a4b/ldm_patched/modules/model_management.py#L526-L537 to always return torch.device("cpu")

But keep in mind that prompt expansion is only used when setting style Fooocus V2, so this might not be the right place to begin with.

jhemley commented 5 months ago

sed when setting style Fooocus V2, so this might

this would only make the text model run on CPU not the image model correct?

mashb1t commented 5 months ago

yes

jhemley commented 5 months ago

I tried that, but it didn't really work. Now that you are aware of this problem, is it possible that there are any plans in the future to try and trim the VRAM requirements by about 200 mib to allow them to run on Tesla 8 GB GPus?

mashb1t commented 5 months ago

No plans for in-depth testing on P4 and M60 cards, works on 4GB VRAM and must be an issue with your driver reporting wrong numbers.

jhemley commented 5 months ago

yeah that sucks. But i did just buy a tesla m40 which has 24gb vram so hopefully that works. Last question: are there any possibilities of adding multi-GPU support like what Ollama has?

mashb1t commented 5 months ago

See https://github.com/lllyasviel/Fooocus/discussions/2292

What you can do is to start multiple instances of Fooocus instead.

CultusMechanicus commented 4 months ago

This is a weird driver setting issue with P4s. By default, it runs ECC memory. Disable the ECC RAM with "nvidia-smi -e 0". That should release the full 8GB of VRAM.