[Bug]: GPU usage 0% | cuda out of memory

Checklist

[X] The issue has not been resolved by following the troubleshooting guide
[X] The issue exists on a clean installation of Fooocus
[X] The issue exists in the current version of Fooocus
[X] The issue has not been reported before recently
[X] The issue has been reported before but has not been fixed yet

What happened?

When run_realistc.bat is executed without any parameters to entry_with_update.py, fooocus used RAM to generate image and GPU usagew is 0%. I have Asus laptop with NVIDIA gtx 1660 ti 6GB.

When, parameters like --always-gpu used, still does not use GPU. when added --disable-offload-from-vram it gives cuda out of memory. Output error below: `D:\Fooocus_win64_2-1-831>.\python_embeded\python.exe -s Fooocus\entry_with_update.py --preset realistic --always-gpu --disable-offload-from-vram Already up-to-date Update succeeded. [System ARGV] ['Fooocus\entry_with_update.py', '--preset', 'realistic', '--always-gpu', '--disable-offload-from-vram'] Python 3.10.9 (tags/v3.10.9:1dd9be6, Dec 6 2022, 20:01:21) [MSC v.1934 64 bit (AMD64)] Fooocus version: 2.3.1 Loaded preset: D:\Fooocus_win64_2-1-831\Fooocus\presets\realistic.json [Cleanup] Attempting to delete content of temp dir C:\Users\xxxxx\AppData\Local\Temp\fooocus [Cleanup] Cleanup successful Total VRAM 6144 MB, total RAM 15790 MB Set vram state to: HIGH_VRAM Device: cuda:0 NVIDIA GeForce GTX 1660 Ti : native VAE dtype: torch.float32 Using pytorch cross attention Refiner unloaded. Running on local URL: http://127.0.0.1:7865

To create a public link, set `share=True` in `launch()`. IMPORTANT: You are using gradio version 3.41.2, however version 4.29.0 is available, please upgrade.

model_type EPS UNet ADM Dimension 2816 Using pytorch attention in VAE Working with z of shape (1, 4, 32, 32) = 4096 dimensions. Using pytorch attention in VAE extra {'cond_stage_model.clip_l.text_projection', 'cond_stage_model.clip_g.transformer.text_model.embeddings.position_ids', 'cond_stage_model.clip_l.logit_scale'} loaded straight to GPU Requested to load SDXL Loading 1 new model [Fooocus Model Management] Moving model(s) has taken 0.16 seconds Base model loaded: D:\Fooocus_win64_2-1-831\Fooocus\models\checkpoints\realisticStockPhoto_v20.safetensors Request to load LoRAs [['SDXL_FILM_PHOTOGRAPHY_STYLE_BetaV0.4.safetensors', 0.25], ['None', 1.0], ['None', 1.0], ['None', 1.0], ['None', 1.0]] for model [D:\Fooocus_win64_2-1-831\Fooocus\models\checkpoints\realisticStockPhoto_v20.safetensors]. Loaded LoRA [D:\Fooocus_win64_2-1-831\Fooocus\models\loras\SDXL_FILM_PHOTOGRAPHY_STYLE_BetaV0.4.safetensors] for UNet [D:\Fooocus_win64_2-1-831\Fooocus\models\checkpoints\realisticStockPhoto_v20.safetensors] with 788 keys at weight 0.25. Loaded LoRA [D:\Fooocus_win64_2-1-831\Fooocus\models\loras\SDXL_FILM_PHOTOGRAPHY_STYLE_BetaV0.4.safetensors] for CLIP [D:\Fooocus_win64_2-1-831\Fooocus\models\checkpoints\realisticStockPhoto_v20.safetensors] with 264 keys at weight 0.25. Fooocus V2 Expansion: Vocab with 642 words. Fooocus Expansion engine loaded for cuda:0, use_fp16 = False. Requested to load SDXLClipModel Requested to load GPT2LMHeadModel Loading 2 new models ERROR clip_g.transformer.text_model.encoder.layers.12.self_attn.out_proj.weight CUDA out of memory. Tried to allocate 20.00 MiB. GPU 0 has a total capacty of 6.00 GiB of which 0 bytes is free. Of the allocated memory 12.25 GiB is allocated by PyTorch, and 230.03 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF Exception in thread Thread-2 (worker): Traceback (most recent call last): File "D:\Fooocus_win64_2-1-831\Fooocus\ldm_patched\modules\model_management.py", line 300, in model_load self.real_model = self.model.patch_model(device_to=patch_model_to) #TODO: do something with loras and offloading to CPU File "D:\Fooocus_win64_2-1-831\Fooocus\ldm_patched\modules\model_patcher.py", line 196, in patch_model self.backup[key] = weight.to(device=self.offload_device, copy=inplace_update) torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 26.00 MiB. GPU 0 has a total capacty of 6.00 GiB of which 0 bytes is free. Of the allocated memory 12.25 GiB is allocated by PyTorch, and 230.34 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "threading.py", line 1016, in _bootstrap_inner File "threading.py", line 953, in run File "D:\Fooocus_win64_2-1-831\Fooocus\modules\async_worker.py", line 32, in worker import modules.default_pipeline as pipeline File "D:\Fooocus_win64_2-1-831\Fooocus\modules\default_pipeline.py", line 254, in refresh_everything( File "D:\Fooocus_win64_2-1-831\python_embeded\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context return func(*args, kwargs) File "D:\Fooocus_win64_2-1-831\python_embeded\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context return func(*args, *kwargs) File "D:\Fooocus_win64_2-1-831\Fooocus\modules\default_pipeline.py", line 249, in refresh_everything prepare_text_encoder(async_call=True) File "D:\Fooocus_win64_2-1-831\python_embeded\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context return func(args, kwargs) File "D:\Fooocus_win64_2-1-831\python_embeded\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context return func(*args, *kwargs) File "D:\Fooocus_win64_2-1-831\Fooocus\modules\default_pipeline.py", line 212, in prepare_text_encoder ldm_patched.modules.model_management.load_models_gpu([final_clip.patcher, final_expansion.patcher]) File "D:\Fooocus_win64_2-1-831\Fooocus\modules\patch.py", line 447, in patched_load_models_gpu y = ldm_patched.modules.model_management.load_models_gpu_origin(args, **kwargs) File "D:\Fooocus_win64_2-1-831\Fooocus\ldm_patched\modules\model_management.py", line 437, in load_models_gpu cur_loaded_model = loaded_model.model_load(lowvram_model_memory) File "D:\Fooocus_win64_2-1-831\Fooocus\ldm_patched\modules\model_management.py", line 302, in model_load self.model.unpatch_model(self.model.offload_device) File "D:\Fooocus_win64_2-1-831\Fooocus\ldm_patched\modules\model_patcher.py", line 350, in unpatch_model self.model.to(device_to) File "D:\Fooocus_win64_2-1-831\python_embeded\lib\site-packages\torch\nn\modules\module.py", line 1160, in to return self._apply(convert) File "D:\Fooocus_win64_2-1-831\python_embeded\lib\site-packages\torch\nn\modules\module.py", line 810, in _apply module._apply(fn) File "D:\Fooocus_win64_2-1-831\python_embeded\lib\site-packages\torch\nn\modules\module.py", line 810, in _apply module._apply(fn) File "D:\Fooocus_win64_2-1-831\python_embeded\lib\site-packages\torch\nn\modules\module.py", line 810, in _apply module._apply(fn) [Previous line repeated 5 more times] File "D:\Fooocus_win64_2-1-831\python_embeded\lib\site-packages\torch\nn\modules\module.py", line 833, in _apply param_applied = fn(param) File "D:\Fooocus_win64_2-1-831\python_embeded\lib\site-packages\torch\nn\modules\module.py", line 1158, in convert return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking) torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 26.00 MiB. GPU 0 has a total capacty of 6.00 GiB of which 0 bytes is free. Of the allocated memory 11.76 GiB is allocated by PyTorch, and 731.59 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF`

Steps to reproduce the problem

execute run_realistic.bat file. give prompt

What should have happened?

Used the GPU instead of RAM and CPU

What browsers do you use to access Fooocus?

Mozilla Firefox

Where are you running Fooocus?

Locally

What operating system are you using?

Windows 11

Console logs

D:\Fooocus_win64_2-1-831>.\python_embeded\python.exe -s Fooocus\entry_with_update.py --preset realistic --always-gpu --disable-offload-from-vram
Already up-to-date
Update succeeded.
[System ARGV] ['Fooocus\\entry_with_update.py', '--preset', 'realistic', '--always-gpu', '--disable-offload-from-vram']
Python 3.10.9 (tags/v3.10.9:1dd9be6, Dec  6 2022, 20:01:21) [MSC v.1934 64 bit (AMD64)]
Fooocus version: 2.3.1
Loaded preset: D:\Fooocus_win64_2-1-831\Fooocus\presets\realistic.json
[Cleanup] Attempting to delete content of temp dir C:\Users\divyj\AppData\Local\Temp\fooocus
[Cleanup] Cleanup successful
Total VRAM 6144 MB, total RAM 15790 MB
Set vram state to: HIGH_VRAM
Device: cuda:0 NVIDIA GeForce GTX 1660 Ti : native
VAE dtype: torch.float32
Using pytorch cross attention
Refiner unloaded.
Running on local URL:  http://127.0.0.1:7865

To create a public link, set `share=True` in `launch()`.
IMPORTANT: You are using gradio version 3.41.2, however version 4.29.0 is available, please upgrade.
--------
model_type EPS
UNet ADM Dimension 2816
Using pytorch attention in VAE
Working with z of shape (1, 4, 32, 32) = 4096 dimensions.
Using pytorch attention in VAE
extra {'cond_stage_model.clip_l.text_projection', 'cond_stage_model.clip_g.transformer.text_model.embeddings.position_ids', 'cond_stage_model.clip_l.logit_scale'}
loaded straight to GPU
Requested to load SDXL
Loading 1 new model
[Fooocus Model Management] Moving model(s) has taken 0.16 seconds
Base model loaded: D:\Fooocus_win64_2-1-831\Fooocus\models\checkpoints\realisticStockPhoto_v20.safetensors
Request to load LoRAs [['SDXL_FILM_PHOTOGRAPHY_STYLE_BetaV0.4.safetensors', 0.25], ['None', 1.0], ['None', 1.0], ['None', 1.0], ['None', 1.0]] for model [D:\Fooocus_win64_2-1-831\Fooocus\models\checkpoints\realisticStockPhoto_v20.safetensors].
Loaded LoRA [D:\Fooocus_win64_2-1-831\Fooocus\models\loras\SDXL_FILM_PHOTOGRAPHY_STYLE_BetaV0.4.safetensors] for UNet [D:\Fooocus_win64_2-1-831\Fooocus\models\checkpoints\realisticStockPhoto_v20.safetensors] with 788 keys at weight 0.25.
Loaded LoRA [D:\Fooocus_win64_2-1-831\Fooocus\models\loras\SDXL_FILM_PHOTOGRAPHY_STYLE_BetaV0.4.safetensors] for CLIP [D:\Fooocus_win64_2-1-831\Fooocus\models\checkpoints\realisticStockPhoto_v20.safetensors] with 264 keys at weight 0.25.
Fooocus V2 Expansion: Vocab with 642 words.
Fooocus Expansion engine loaded for cuda:0, use_fp16 = False.
Requested to load SDXLClipModel
Requested to load GPT2LMHeadModel
Loading 2 new models
ERROR clip_g.transformer.text_model.encoder.layers.12.self_attn.out_proj.weight CUDA out of memory. Tried to allocate 20.00 MiB. GPU 0 has a total capacty of 6.00 GiB of which 0 bytes is free. Of the allocated memory 12.25 GiB is allocated by PyTorch, and 230.03 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
Exception in thread Thread-2 (worker):
Traceback (most recent call last):
  File "D:\Fooocus_win64_2-1-831\Fooocus\ldm_patched\modules\model_management.py", line 300, in model_load
    self.real_model = self.model.patch_model(device_to=patch_model_to) #TODO: do something with loras and offloading to CPU
  File "D:\Fooocus_win64_2-1-831\Fooocus\ldm_patched\modules\model_patcher.py", line 196, in patch_model
    self.backup[key] = weight.to(device=self.offload_device, copy=inplace_update)
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 26.00 MiB. GPU 0 has a total capacty of 6.00 GiB of which 0 bytes is free. Of the allocated memory 12.25 GiB is allocated by PyTorch, and 230.34 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "threading.py", line 1016, in _bootstrap_inner
  File "threading.py", line 953, in run
  File "D:\Fooocus_win64_2-1-831\Fooocus\modules\async_worker.py", line 32, in worker
    import modules.default_pipeline as pipeline
  File "D:\Fooocus_win64_2-1-831\Fooocus\modules\default_pipeline.py", line 254, in <module>
    refresh_everything(
  File "D:\Fooocus_win64_2-1-831\python_embeded\lib\site-packages\torch\utils\_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "D:\Fooocus_win64_2-1-831\python_embeded\lib\site-packages\torch\utils\_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "D:\Fooocus_win64_2-1-831\Fooocus\modules\default_pipeline.py", line 249, in refresh_everything
    prepare_text_encoder(async_call=True)
  File "D:\Fooocus_win64_2-1-831\python_embeded\lib\site-packages\torch\utils\_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "D:\Fooocus_win64_2-1-831\python_embeded\lib\site-packages\torch\utils\_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "D:\Fooocus_win64_2-1-831\Fooocus\modules\default_pipeline.py", line 212, in prepare_text_encoder
    ldm_patched.modules.model_management.load_models_gpu([final_clip.patcher, final_expansion.patcher])
  File "D:\Fooocus_win64_2-1-831\Fooocus\modules\patch.py", line 447, in patched_load_models_gpu
    y = ldm_patched.modules.model_management.load_models_gpu_origin(*args, **kwargs)
  File "D:\Fooocus_win64_2-1-831\Fooocus\ldm_patched\modules\model_management.py", line 437, in load_models_gpu
    cur_loaded_model = loaded_model.model_load(lowvram_model_memory)
  File "D:\Fooocus_win64_2-1-831\Fooocus\ldm_patched\modules\model_management.py", line 302, in model_load
    self.model.unpatch_model(self.model.offload_device)
  File "D:\Fooocus_win64_2-1-831\Fooocus\ldm_patched\modules\model_patcher.py", line 350, in unpatch_model
    self.model.to(device_to)
  File "D:\Fooocus_win64_2-1-831\python_embeded\lib\site-packages\torch\nn\modules\module.py", line 1160, in to
    return self._apply(convert)
  File "D:\Fooocus_win64_2-1-831\python_embeded\lib\site-packages\torch\nn\modules\module.py", line 810, in _apply
    module._apply(fn)
  File "D:\Fooocus_win64_2-1-831\python_embeded\lib\site-packages\torch\nn\modules\module.py", line 810, in _apply
    module._apply(fn)
  File "D:\Fooocus_win64_2-1-831\python_embeded\lib\site-packages\torch\nn\modules\module.py", line 810, in _apply
    module._apply(fn)
  [Previous line repeated 5 more times]
  File "D:\Fooocus_win64_2-1-831\python_embeded\lib\site-packages\torch\nn\modules\module.py", line 833, in _apply
    param_applied = fn(param)
  File "D:\Fooocus_win64_2-1-831\python_embeded\lib\site-packages\torch\nn\modules\module.py", line 1158, in convert
    return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 26.00 MiB. GPU 0 has a total capacty of 6.00 GiB of which 0 bytes is free. Of the allocated memory 11.76 GiB is allocated by PyTorch, and 731.59 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Additional information

everything is up to date

lllyasviel / Fooocus