lllyasviel / Fooocus

Focus on prompting and generating
GNU General Public License v3.0
40k stars 5.52k forks source link

I am using Nvidia with 8GB VRAM, I get CUDA Out Of Memory #1966

Closed JoeyGorombey closed 7 months ago

JoeyGorombey commented 7 months ago

Read Troubleshoot

[x] I admit that I have read the Troubleshoot before making this issue.

Per the following, I am making an issue:

I am using Nvidia with 8GB VRAM, I get CUDA Out Of Memory It is a BUG. Please let us know as soon as possible. Please make an issue. See also minimal requirements.

Describe the problem I am running into the issue described here: I am using Nvidia with 8GB VRAM, I get CUDA Out Of Memory.

Steps to reproduce:

  1. Launch Fooocus locally by running run.bat
  2. Check "Input Image"
  3. Check "Image Prompt"
  4. Provide an image
  5. Check "Advanced"
  6. Choose "Face Swap"
  7. Provide a text prompt
  8. Click "Generate"
  9. The exception occurs when loading the models

Full console log attached.

Full Console Log

D:\Foocus>.\python_embeded\python.exe -s Fooocus\entry_with_update.py
Already up-to-date
Update succeeded.
[System ARGV] ['Fooocus\\entry_with_update.py']
Python 3.10.9 (tags/v3.10.9:1dd9be6, Dec  6 2022, 20:01:21) [MSC v.1934 64 bit (AMD64)]
Fooocus version: 2.1.862
Running on local URL:  http://127.0.0.1:7865

To create a public link, set `share=True` in `launch()`.
Total VRAM 8191 MB, total RAM 16311 MB
Set vram state to: NORMAL_VRAM
Always offload VRAM
Device: cuda:0 NVIDIA GeForce RTX 3070 : native
VAE dtype: torch.bfloat16
Using pytorch cross attention
Refiner unloaded.
model_type EPS
UNet ADM Dimension 2816
Using pytorch attention in VAE
Working with z of shape (1, 4, 32, 32) = 4096 dimensions.
Using pytorch attention in VAE
extra {'cond_stage_model.clip_g.transformer.text_model.embeddings.position_ids', 'cond_stage_model.clip_l.text_projection', 'cond_stage_model.clip_l.logit_scale'}
Base model loaded: D:\Foocus\Fooocus\models\checkpoints\juggernautXL_version6Rundiffusion.safetensors
Request to load LoRAs [['sd_xl_offset_example-lora_1.0.safetensors', 0.1], ['None', 1.0], ['None', 1.0], ['None', 1.0], ['None', 1.0]] for model [D:\Foocus\Fooocus\models\checkpoints\juggernautXL_version6Rundiffusion.safetensors].
Loaded LoRA [D:\Foocus\Fooocus\models\loras\sd_xl_offset_example-lora_1.0.safetensors] for UNet [D:\Foocus\Fooocus\models\checkpoints\juggernautXL_version6Rundiffusion.safetensors] with 788 keys at weight 0.1.
Fooocus V2 Expansion: Vocab with 642 words.
Fooocus Expansion engine loaded for cuda:0, use_fp16 = True.
Requested to load SDXLClipModel
Requested to load GPT2LMHeadModel
Loading 2 new models
[Fooocus Model Management] Moving model(s) has taken 0.73 seconds
App started successful. Use the app with http://127.0.0.1:7865/ or 127.0.0.1:7865
[Parameters] Adaptive CFG = 7
[Parameters] Sharpness = 2
[Parameters] ADM Scale = 1.5 : 0.8 : 0.3
[Parameters] CFG = 4.0
[Parameters] Seed = 7541541838771336299
[Fooocus] Downloading control models ...
[Fooocus] Loading control models ...
extra clip vision: ['vision_model.embeddings.position_ids']
[Parameters] Sampler = dpmpp_2m_sde_gpu - karras
[Parameters] Steps = 30 - 15
[Fooocus] Initializing ...
[Fooocus] Loading models ...
Refiner unloaded.
[Fooocus] Processing prompts ...
[Fooocus] Preparing Fooocus text #1 ...
[Prompt Expansion] drinking tea at desk in the morning, full color, coherent, symmetry, glowing, gorgeous, perfect detailed, intricate, atmosphere, professional, highly saturated colors, cinematic, dramatic, sharp focus, fine detail, open composition, artistic, innocent, beautiful, enhanced, light, cozy, creative, positive, cute, adorable, infinite, elegant, lovely, pure, elaborate
[Fooocus] Preparing Fooocus text #2 ...
[Prompt Expansion] drinking tea at desk in the morning, full detail, dynamic composition, dramatic, vivid, beautiful, intricate, elegant, highly detailed, professional winning, light, clear focus, inspired, rich colors, surreal background, artistic, new, color, complex, cool, amazing, symmetry, enhanced, cute, perfect, awesome, creative, positive, neat, focused, friendly
[Fooocus] Encoding positive #1 ...
[Fooocus] Encoding positive #2 ...
[Fooocus] Encoding negative #1 ...
[Fooocus] Encoding negative #2 ...
[Fooocus] Image processing ...
Detected 1 faces
Requested to load CLIPVisionModelWithProjection
Loading 1 new model
[Fooocus Model Management] Moving model(s) has taken 1.40 seconds
Requested to load Resampler
Loading 1 new model
[Fooocus Model Management] Moving model(s) has taken 0.39 seconds
Requested to load To_KV
Loading 1 new model
[Fooocus Model Management] Moving model(s) has taken 0.12 seconds
[Parameters] Denoising Strength = 1.0
[Parameters] Initial Latent shape: Image Space (896, 1152)
Preparation time: 36.54 seconds
[Sampler] refiner_swap_method = joint
[Sampler] sigma_min = 0.0291671771556139, sigma_max = 14.614643096923828
Requested to load SDXL
Loading 1 new model
ERROR diffusion_model.output_blocks.2.1.transformer_blocks.7.ff.net.0.proj.weight CUDA out of memory. Tried to allocate 50.00 MiB. GPU 0 has a total capacty of 8.00 GiB of which 2.30 GiB is free. Of the allocated memory 4.45 GiB is allocated by PyTorch, and 151.61 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
Traceback (most recent call last):
  File "D:\Foocus\Fooocus\modules\async_worker.py", line 823, in worker
    handler(task)
  File "D:\Foocus\python_embeded\lib\site-packages\torch\utils\_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "D:\Foocus\python_embeded\lib\site-packages\torch\utils\_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "D:\Foocus\Fooocus\modules\async_worker.py", line 754, in handler
    imgs = pipeline.process_diffusion(
  File "D:\Foocus\python_embeded\lib\site-packages\torch\utils\_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "D:\Foocus\python_embeded\lib\site-packages\torch\utils\_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "D:\Foocus\Fooocus\modules\default_pipeline.py", line 361, in process_diffusion
    sampled_latent = core.ksampler(
  File "D:\Foocus\python_embeded\lib\site-packages\torch\utils\_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "D:\Foocus\python_embeded\lib\site-packages\torch\utils\_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "D:\Foocus\Fooocus\modules\core.py", line 313, in ksampler
    samples = ldm_patched.modules.sample.sample(model,
  File "D:\Foocus\Fooocus\ldm_patched\modules\sample.py", line 94, in sample
    real_model, positive_copy, negative_copy, noise_mask, models = prepare_sampling(model, noise.shape, positive, negative, noise_mask)
  File "D:\Foocus\Fooocus\ldm_patched\modules\sample.py", line 87, in prepare_sampling
    ldm_patched.modules.model_management.load_models_gpu([model] + models, model.memory_required([noise_shape[0] * 2] + list(noise_shape[1:])) + inference_memory)
  File "D:\Foocus\Fooocus\modules\patch.py", line 441, in patched_load_models_gpu
    y = ldm_patched.modules.model_management.load_models_gpu_origin(*args, **kwargs)
  File "D:\Foocus\Fooocus\ldm_patched\modules\model_management.py", line 434, in load_models_gpu
    cur_loaded_model = loaded_model.model_load(lowvram_model_memory)
  File "D:\Foocus\Fooocus\ldm_patched\modules\model_management.py", line 301, in model_load
    raise e
  File "D:\Foocus\Fooocus\ldm_patched\modules\model_management.py", line 297, in model_load
    self.real_model = self.model.patch_model(device_to=patch_model_to) #TODO: do something with loras and offloading to CPU
  File "D:\Foocus\Fooocus\ldm_patched\modules\model_patcher.py", line 198, in patch_model
    temp_weight = ldm_patched.modules.model_management.cast_to_device(weight, device_to, torch.float32, copy=True)
  File "D:\Foocus\Fooocus\ldm_patched\modules\model_management.py", line 612, in cast_to_device
    return tensor.to(device, copy=copy, non_blocking=non_blocking).to(dtype, non_blocking=non_blocking)
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 50.00 MiB. GPU 0 has a total capacty of 8.00 GiB of which 2.30 GiB is free. Of the allocated memory 4.44 GiB is allocated by PyTorch, and 160.71 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
Total time: 103.98 seconds
JoeyGorombey commented 7 months ago

dxdiagoutput

Dxdiag attached -- 8 GB VRAM, 16GB system RAM, RTX 3060 on Windows 10 Home 64 bit, which I believe meets min requirements

eddyizm commented 7 months ago

Did you enable the system swap?

mashb1t commented 7 months ago

@JoeyGorombey is this still relevant and do you require further assistance?

thiner commented 7 months ago

Seems only 2.2 GB vram left, were you running other programs which occupied GPU as well?

JoeyGorombey commented 7 months ago

I managed to allow the system to manage my VRAM, this ameliorated the issue but did not entirely solve it. I think the best course is probably to get more actual RAM on my system, as I only have 16 GB

CURRENTSWAPSETTINGS