lllyasviel / Fooocus

Focus on prompting and generating
GNU General Public License v3.0
41.7k stars 5.95k forks source link

Cuda Out of Memory #1984

Closed fedyfausto closed 10 months ago

fedyfausto commented 10 months ago

Read Troubleshoot

[x] I admit that I have read the Troubleshoot before making this issue.

Describe the problem Hello guys, I am trying to launch Fooocus under a Ubuntu server 22.04 with two NVIDIA Tesla K80 (each with 12 GB of VRAM). When I launch a prompt Fooocus crashes saying that there is no VRAM because pytorch is using about 10 GB of VRAM only for him (why?). How can I solve this problem? I tried --lowvram but does not work. The pytorch version is the 2.0.1

photo_2024-01-19_13-45-22

Full Console Log

(fooocus_env) fedyfausto@fedyfausto-unict:~/Fooocus$ python entry_with_update.py --lowvram --port 8001 --listen 0.0.0.0
Update failed.
'refs/heads/HEAD'
Update succeeded.
Python 3.10.12 (main, Nov 20 2023, 15:14:05) [GCC 11.4.0]
Fooocus version: 2.1.703
Running on local URL:  http://0.0.0.0:8001

To create a public link, set `share=True` in `launch()`.
Total VRAM 11441 MB, total RAM 64300 MB
Set vram state to: LOW_VRAM
Device: cuda:0 Tesla K80 : native
VAE dtype: torch.float32
Using pytorch cross attention
[Fooocus] Disabling smart memory
model_type EPS
adm 2560
Using pytorch attention in VAE
Working with z of shape (1, 4, 32, 32) = 4096 dimensions.
Using pytorch attention in VAE
missing {'cond_stage_model.clip_g.transformer.text_model.embeddings.position_ids'}
Refiner model loaded: /home/fedyfausto/Fooocus/models/checkpoints/sd_xl_refiner_1.0_0.9vae.safetensors
model_type EPS
adm 2816
Using pytorch attention in VAE
Working with z of shape (1, 4, 32, 32) = 4096 dimensions.
Using pytorch attention in VAE
missing {'cond_stage_model.clip_l.logit_scale', 'cond_stage_model.clip_l.text_projection', 'cond_stage_model.clip_g.transformer.text_model.embeddings.position_ids'}
Base model loaded: /home/fedyfausto/Fooocus/models/checkpoints/sd_xl_base_1.0_0.9vae.safetensors

LoRAs loaded: [('sd_xl_offset_example-lora_1.0.safetensors', 0.5), ('None', 0.5), ('None', 0.5), ('None', 0.5), ('None', 0.5)]
Fooocus Expansion engine loaded for cuda:0, use_fp16 = False.
Requested to load SDXLClipModel
Requested to load GPT2LMHeadModel
Loading 2 new models
[Fooocus Model Management] Moving model(s) has taken 1.34 seconds
App started successful. Use the app with http://localhost:8001/ or 0.0.0.0:8001
[Parameters] Adaptive CFG = 7
[Parameters] Sharpness = 2
[Parameters] ADM Scale = 1.5 : 0.8 : 0.3
[Parameters] CFG = 7.0
[Parameters] Sampler = dpmpp_2m_sde_gpu - karras
[Parameters] Steps = 30 - 20
[Fooocus] Initializing ...
[Fooocus] Loading models ...
[Fooocus] Processing prompts ...
[Fooocus] Preparing Fooocus text #1 ...
[Prompt Expansion] New suffix: intricate, elegant, highly detailed, digital painting, artstation, concept art, smooth, sharp focus, illustration, artgerm, tomasz alen kopera, peter mohrbacher, donato giancola, joseph christian leyendecker, wlop, boris vallejo
[Fooocus] Preparing Fooocus text #2 ...
[Prompt Expansion] New suffix: extremely detailed oil painting, unreal 5 render, rhads, Bruce Pennington, Studio Ghibli, tim hildebrandt, digital art, octane render, beautiful composition, trending on artstation, award-winning photograph, masterpiece
[Fooocus] Encoding positive #1 ...
[Fooocus Model Management] Moving model(s) has taken 0.37 seconds
[Fooocus] Encoding positive #2 ...
[Fooocus] Encoding negative #1 ...
[Fooocus] Encoding negative #2 ...
Preparation time: 4.04 seconds
[Sampler] refiner_swap_method = joint
[Sampler] sigma_min = 0.02916753850877285, sigma_max = 14.614643096923828
Requested to load SDXL
Loading 1 new model
[Fooocus Model Management] Moving model(s) has taken 6.68 seconds
[Sampler] Fooocus sampler is activated.
  0%|                                                                                                                     | 0/30 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "/home/fedyfausto/Fooocus/modules/async_worker.py", line 583, in worker
    handler(task)
  File "/home/fedyfausto/Fooocus/fooocus_env/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/fedyfausto/Fooocus/fooocus_env/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/fedyfausto/Fooocus/modules/async_worker.py", line 516, in handler
    imgs = pipeline.process_diffusion(
  File "/home/fedyfausto/Fooocus/fooocus_env/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/fedyfausto/Fooocus/fooocus_env/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/fedyfausto/Fooocus/modules/default_pipeline.py", line 358, in process_diffusion
    sampled_latent = core.ksampler(
  File "/home/fedyfausto/Fooocus/fooocus_env/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/fedyfausto/Fooocus/fooocus_env/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/fedyfausto/Fooocus/modules/core.py", line 261, in ksampler
    samples = fcbh.sample.sample(model, noise, steps, cfg, sampler_name, scheduler, positive, negative, latent_image,
  File "/home/fedyfausto/Fooocus/backend/headless/fcbh/sample.py", line 97, in sample
    samples = sampler.sample(noise, positive_copy, negative_copy, cfg=cfg, latent_image=latent_image, start_step=start_step, last_step=last_step, force_full_denoise=force_full_denoise, denoise_mask=noise_mask, sigmas=sigmas, callback=callback, disable_pbar=disable_pbar, seed=seed)
  File "/home/fedyfausto/Fooocus/backend/headless/fcbh/samplers.py", line 785, in sample
    return sample(self.model, noise, positive, negative, cfg, self.device, sampler(), sigmas, self.model_options, latent_image=latent_image, denoise_mask=denoise_mask, callback=callback, disable_pbar=disable_pbar, seed=seed)
  File "/home/fedyfausto/Fooocus/fooocus_env/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/fedyfausto/Fooocus/fooocus_env/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/fedyfausto/Fooocus/modules/sample_hijack.py", line 144, in sample_hacked
    samples = sampler.sample(model_wrap, sigmas, extra_args, callback_wrap, noise, latent_image, denoise_mask, disable_pbar)
  File "/home/fedyfausto/Fooocus/backend/headless/fcbh/samplers.py", line 630, in sample
    samples = getattr(k_diffusion_sampling, "sample_{}".format(sampler_name))(model_k, noise, sigmas, extra_args=extra_args, callback=k_callback, disable=disable_pbar, **extra_options)
  File "/home/fedyfausto/Fooocus/fooocus_env/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/fedyfausto/Fooocus/modules/patch.py", line 316, in sample_dpmpp_fooocus_2m_sde_inpaint_seamless
    denoised = model(x, sigmas[i] * s_in, **extra_args)
  File "/home/fedyfausto/Fooocus/fooocus_env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/fedyfausto/Fooocus/backend/headless/fcbh/samplers.py", line 323, in forward
    out = self.inner_model(x, sigma, cond=cond, uncond=uncond, cond_scale=cond_scale, cond_concat=cond_concat, model_options=model_options, seed=seed)
  File "/home/fedyfausto/Fooocus/fooocus_env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/fedyfausto/Fooocus/modules/patch.py", line 198, in patched_discrete_eps_ddpm_denoiser_forward
    eps = self.get_eps(input * c_in, self.sigma_to_t(sigma), **kwargs)
  File "/home/fedyfausto/Fooocus/backend/headless/fcbh/k_diffusion/external.py", line 151, in get_eps
    return self.inner_model.apply_model(*args, **kwargs)
  File "/home/fedyfausto/Fooocus/backend/headless/fcbh/samplers.py", line 311, in apply_model
    out = sampling_function(self.inner_model.apply_model, x, timestep, uncond, cond, cond_scale, cond_concat, model_options=model_options, seed=seed)
  File "/home/fedyfausto/Fooocus/backend/headless/fcbh/samplers.py", line 289, in sampling_function
    cond, uncond = calc_cond_uncond_batch(model_function, cond, uncond, x, timestep, max_total_area, cond_concat, model_options)
  File "/home/fedyfausto/Fooocus/backend/headless/fcbh/samplers.py", line 263, in calc_cond_uncond_batch
    output = model_options['model_function_wrapper'](model_function, {"input": input_x, "timestep": timestep_, "c": c, "cond_or_uncond": cond_or_uncond}).chunk(batch_chunks)
  File "/home/fedyfausto/Fooocus/modules/patch.py", line 206, in patched_model_function_wrapper
    return func(x, t, **c)
  File "/home/fedyfausto/Fooocus/backend/headless/fcbh/model_base.py", line 63, in apply_model
    return self.diffusion_model(xc, t, context=context, y=c_adm, control=control, transformer_options=transformer_options).float()
  File "/home/fedyfausto/Fooocus/fooocus_env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/fedyfausto/Fooocus/modules/patch.py", line 414, in patched_unet_forward
    h = forward_timestep_embed(module, h, emb, context, transformer_options)
  File "/home/fedyfausto/Fooocus/backend/headless/fcbh/ldm/modules/diffusionmodules/openaimodel.py", line 56, in forward_timestep_embed
    x = layer(x, context, transformer_options)
  File "/home/fedyfausto/Fooocus/fooocus_env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/fedyfausto/Fooocus/backend/headless/fcbh/ldm/modules/attention.py", line 534, in forward
    x = block(x, context=context[i], transformer_options=transformer_options)
  File "/home/fedyfausto/Fooocus/fooocus_env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/fedyfausto/Fooocus/backend/headless/fcbh/ldm/modules/attention.py", line 364, in forward
    return checkpoint(self._forward, (x, context, transformer_options), self.parameters(), self.checkpoint)
  File "/home/fedyfausto/Fooocus/backend/headless/fcbh/ldm/modules/diffusionmodules/util.py", line 123, in checkpoint
    return func(*inputs)
  File "/home/fedyfausto/Fooocus/backend/headless/fcbh/ldm/modules/attention.py", line 429, in _forward
    n = self.attn1(n, context=context_attn1, value=value_attn1)
  File "/home/fedyfausto/Fooocus/fooocus_env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/fedyfausto/Fooocus/backend/headless/fcbh/ldm/modules/attention.py", line 340, in forward
    out = optimized_attention(q, k, v, self.heads)
  File "/home/fedyfausto/Fooocus/backend/headless/fcbh/ldm/modules/attention.py", line 287, in attention_pytorch
    out = torch.nn.functional.scaled_dot_product_attention(q, k, v, attn_mask=mask, dropout_p=0.0, is_causal=False)
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 622.00 MiB (GPU 0; 11.17 GiB total capacity; 10.36 GiB already allocated; 195.25 MiB free; 10.57 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
Total time: 11.47 seconds
mashb1t commented 10 months ago

Please update Fooocus to the latest version (yours: 2.1.703, latest: 2.1.862, including optimisations for VRAM usage) as you're still using a version with fcbh. After updating, please ensure you've enabled swap for your system, also see https://github.com/lllyasviel/Fooocus/blob/main/troubleshoot.md#system-swap

I'd recommend not to use --low-vram (also not the new flag: --always-low-vram) as Fooocus automatically switches to low vram mode if low ressurce availability is detected.

Please provide your feedback after updating Fooocus, checking swap and removing --low-vram.

fedyfausto commented 10 months ago

The swap is active:

swapon -s
Filename                                Type            Size            Used            Priority
/swap.img                               file            8388604         0               -2

I checked out the main release and added the --attention-split option and it works, but why?

mashb1t commented 10 months ago

Splitting attention reduces VRAM, which makes it possible for you to run Fooocus. Please also check https://vaclavkosar.com/ml/cross-attention-in-transformer-architecture for further information how attention works in SD.

I assume there still is a misconfiguration on the system so swap isn't (effectively?) used or used at all for Fooocus as this behavior has only been reported by you with the latest version and it's working on Colab or other cloud providers running Linux. Happy you found a working solution, closing this issue now. Feel free to reopen if you seem to have additional trouble.

fedyfausto commented 10 months ago

The issue is not resolved because if I try to use the image input Fooocus will crash with the same errors :< and it is not normal that in our server fooocus takes this amount of RAM, in google collab it works well ONLY with 12 GB of RAM and VRAM together, in our server we have 12 GB of VRAM and 64 of RAM.

piotr-sikora-v commented 10 months ago

Hi,

I have P40 with 24GB VRAM and also get this error.

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 10.00 MiB. GPU 0 has a total capacty of 23.87 GiB of which 3.62 MiB is free. Process 3973655 has 456.00 MiB memory in use. Process 3973917 has 2.97 GiB memory in use. Process 3606741 has 458.00 MiB memory in use. Process 3607780 has 2.82 GiB memory in use. Process 783706 has 17.18 GiB memory in use. Of the allocated memory 16.56 GiB is allocated by PyTorch, and 456.57 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF