lllyasviel / stable-diffusion-webui-forge

GNU Affero General Public License v3.0
7.38k stars 715 forks source link

you may cause NVIDIA GPU degradation... 😦 #1655

Open ZeroCool22 opened 2 weeks ago

ZeroCool22 commented 2 weeks ago

Screenshot_3

Python 3.10.6 (tags/v3.10.6:9c7b4bd, Aug  1 2022, 21:53:49) [MSC v.1932 64 bit (AMD64)]
Version: f2.0.1v1.10.1-previous-468-g3d62fa95
Commit hash: 3d62fa959883bd7358cc548bb945113b8a7f323f
CUDA 12.4
Launching Web UI with arguments: --cuda-malloc --xformers --forge-ref-a1111-home C:/Users/ZeroCool22/Desktop/AutoSDXL/stable-diffusion-webui/ --ckpt-dir F:/Stable-diffusion/ --vae-dir 'C:\Users\ZeroCool22\Desktop\AutoSDXL\stable-diffusion-webui\models\VAE' --hypernetwork-dir /models/hypernetworks --embeddings-dir /embeddings --lora-dir C:/Users/ZeroCool22/Desktop/AutoSDXL/stable-diffusion-webui/models/Lora --controlnet-dir 'C:\Users\ZeroCool22\Desktop\AutoSDXL\stable-diffusion-webui\models\ControlNet' --controlnet-preprocessor-models-dir 'C:\Users\ZeroCool22\Desktop\AutoSDXL\stable-diffusion-webui\extensions\sd-webui-controlnet\annotator\downloads'
Using cudaMallocAsync backend.
Total VRAM 16376 MB, total RAM 32680 MB
pytorch version: 2.4.0+cu124
C:\Users\ZeroCool22\Desktop\webui_forge\system\python\lib\site-packages\xformers\ops\fmha\flash.py:210: FutureWarning: `torch.library.impl_abstract` was renamed to `torch.library.register_fake`. Please use that instead; we will remove `torch.library.impl_abstract` in a future version of PyTorch.
  @torch.library.impl_abstract("xformers_flash::flash_fwd")
C:\Users\ZeroCool22\Desktop\webui_forge\system\python\lib\site-packages\xformers\ops\fmha\flash.py:344: FutureWarning: `torch.library.impl_abstract` was renamed to `torch.library.register_fake`. Please use that instead; we will remove `torch.library.impl_abstract` in a future version of PyTorch.
  @torch.library.impl_abstract("xformers_flash::flash_bwd")
WARNING:xformers:A matching Triton is not available, some optimizations will not be enabled
Traceback (most recent call last):
  File "C:\Users\ZeroCool22\Desktop\webui_forge\system\python\lib\site-packages\xformers\__init__.py", line 57, in _is_triton_available
    import triton  # noqa
ModuleNotFoundError: No module named 'triton'
xformers version: 0.0.28.dev887
Set vram state to: NORMAL_VRAM
Device: cuda:0 NVIDIA GeForce RTX 4070 Ti SUPER : cudaMallocAsync
VAE dtype preferences: [torch.bfloat16, torch.float32] -> torch.bfloat16
CUDA Using Stream: False
C:\Users\ZeroCool22\Desktop\webui_forge\system\python\lib\site-packages\transformers\utils\hub.py:127: FutureWarning: Using `TRANSFORMERS_CACHE` is deprecated and will be removed in v5 of Transformers. Use `HF_HOME` instead.
  warnings.warn(
Using xformers cross attention
Using xformers attention for VAE
ControlNet preprocessor location: C:\Users\ZeroCool22\Desktop\AutoSDXL\stable-diffusion-webui\extensions\sd-webui-controlnet\annotator\downloads
CHv1.8.11: Get Custom Model Folder
[-] ADetailer initialized. version: 24.8.0, num models: 11
18:44:58 - ReActor - STATUS - Running v0.7.1-a2 on Device: CUDA
Textual inversion embeddings loaded(0):
CHv1.8.11: Set Proxy:
2024-09-01 18:45:00,393 - ControlNet - INFO - ControlNet UI callback registered.
Model selected: {'checkpoint_info': {'filename': 'F:\\Stable-diffusion\\FLUX\\flux1-dev-Q8_0.gguf', 'hash': 'b44b9b8a'}, 'additional_modules': ['C:\\Users\\ZeroCool22\\Desktop\\webui_forge\\webui\\models\\VAE\\ae.safetensors', 'C:\\Users\\ZeroCool22\\Desktop\\webui_forge\\webui\\models\\text_encoder\\clip_l.safetensors', 'C:\\Users\\ZeroCool22\\Desktop\\webui_forge\\webui\\models\\text_encoder\\t5xxl_enconly.safetensors'], 'unet_storage_dtype': None}
Using online LoRAs in FP16: True
Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.
Startup time: 24.8s (prepare environment: 7.2s, import torch: 8.4s, initialize shared: 0.2s, other imports: 0.6s, load scripts: 3.4s, create ui: 2.7s, gradio launch: 2.2s).
Environment vars changed: {'stream': False, 'inference_memory': 1024.0, 'pin_shared_memory': False}
[GPU Setting] You will use 93.75% GPU memory (15351.00 MB) to load weights, and use 6.25% GPU memory (1024.00 MB) to do matrix computation.
Model selected: {'checkpoint_info': {'filename': 'F:\\Stable-diffusion\\FLUX\\flux1-dev-Q8_0.gguf', 'hash': 'b44b9b8a'}, 'additional_modules': ['C:\\Users\\ZeroCool22\\Desktop\\webui_forge\\webui\\models\\VAE\\ae.safetensors', 'C:\\Users\\ZeroCool22\\Desktop\\webui_forge\\webui\\models\\text_encoder\\clip_l.safetensors', 'C:\\Users\\ZeroCool22\\Desktop\\webui_forge\\webui\\models\\text_encoder\\t5xxl_enconly.safetensors'], 'unet_storage_dtype': None}
Using online LoRAs in FP16: False
Model selected: {'checkpoint_info': {'filename': 'F:\\Stable-diffusion\\FLUX\\flux1-dev-Q8_0.gguf', 'hash': 'b44b9b8a'}, 'additional_modules': ['C:\\Users\\ZeroCool22\\Desktop\\webui_forge\\webui\\models\\VAE\\ae.safetensors', 'C:\\Users\\ZeroCool22\\Desktop\\webui_forge\\webui\\models\\text_encoder\\clip_l.safetensors', 'C:\\Users\\ZeroCool22\\Desktop\\webui_forge\\webui\\models\\text_encoder\\t5xxl_enconly.safetensors'], 'unet_storage_dtype': None}
Using online LoRAs in FP16: True
Loading Model: {'checkpoint_info': {'filename': 'F:\\Stable-diffusion\\FLUX\\flux1-dev-Q8_0.gguf', 'hash': 'b44b9b8a'}, 'additional_modules': ['C:\\Users\\ZeroCool22\\Desktop\\webui_forge\\webui\\models\\VAE\\ae.safetensors', 'C:\\Users\\ZeroCool22\\Desktop\\webui_forge\\webui\\models\\text_encoder\\clip_l.safetensors', 'C:\\Users\\ZeroCool22\\Desktop\\webui_forge\\webui\\models\\text_encoder\\t5xxl_enconly.safetensors'], 'unet_storage_dtype': None}
[Unload] Trying to free all memory for cuda:0 with 0 models keep loaded ... Done.
StateDict Keys: {'transformer': 780, 'vae': 244, 'text_encoder': 196, 'text_encoder_2': 220, 'ignore': 0}
Using Default T5 Data Type: torch.float16
Using Detected UNet Type: gguf
Using pre-quant state dict!
GGUF state dict: {'F16': 476, 'Q8_0': 304}
GGUF backed 304 layers.
Working with z of shape (1, 16, 32, 32) = 16384 dimensions.
K-Model Created: {'storage_dtype': 'gguf', 'computation_dtype': torch.bfloat16}
Model loaded in 134.1s (unload existing model: 0.2s, forge model load: 133.8s).
[LORA] Loaded C:\Users\ZeroCool22\Desktop\AutoSDXL\stable-diffusion-webui\models\Lora\Flux\Hyper-FLUX.1-dev-8steps-lora.safetensors for KModel-UNet with 504 keys at weight 0.125 (skipped 0 keys)
Skipping unconditional conditioning when CFG = 1. Negative Prompts are ignored.
[Unload] Trying to free 13464.34 MB for cuda:0 with 0 models keep loaded ... Done.
[Memory Management] Target: ModelPatcher, Free GPU: 15077.00 MB, Model Require: 9569.49 MB, Inference Require: 1024.00 MB, Remaining: 4483.51 MB, All loaded to GPU.
Moving model(s) has taken 24.23 seconds
Distilled CFG Scale: 3.5
[Unload] Trying to free 17816.28 MB for cuda:0 with 0 models keep loaded ... Current free memory is 5376.10 MB ... Unload model JointTextEncoder Done.
[Memory Management] Target: UnetPatcher, Free GPU: 15018.08 MB, Model Require: 12119.55 MB, Inference Require: 1024.00 MB, Remaining: 1874.53 MB, All loaded to GPU.
Patched LoRAs on-the-fly; Moving model(s) has taken 20.96 seconds
100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 8/8 [00:07<00:00,  1.01it/s]
[Unload] Trying to free 2658.09 MB for cuda:0 with 0 models keep loaded ... Current free memory is 2889.41 MB ... Done.
[Memory Management] Target: ModelPatcher, Free GPU: 2889.41 MB, Model Require: 159.87 MB, Inference Require: 1024.00 MB, Remaining: 1705.53 MB, All loaded to GPU.
Moving model(s) has taken 0.04 seconds
Skipping unconditional conditioning when CFG = 1. Negative Prompts are ignored.
[Unload] Trying to free 2060.87 MB for cuda:0 with 1 models keep loaded ... Current free memory is 2729.53 MB ... Done.
  0%|                                                                                            | 0/8 [00:00<?, ?it/s]

----------------------
[Low GPU VRAM Warning] Your current GPU free memory is 1394.00 MB for this diffusion iteration.
[Low GPU VRAM Warning] This number is lower than the safe value of 1536.00 MB.
[Low GPU VRAM Warning] If you continue the diffusion process, you may cause NVIDIA GPU degradation, and the speed may be extremely slow (about 10x slower).
[Low GPU VRAM Warning] To solve the problem, you can set the 'GPU Weights' (on the top of page) to a lower value.
[Low GPU VRAM Warning] If you cannot find 'GPU Weights', you can click the 'all' option in the 'UI' area on the left-top corner of the webpage.
[Low GPU VRAM Warning] If you want to take the risk of NVIDIA GPU fallback and test the 10x slower speed, you can (but are highly not recommended to) add '--disable-gpu-warning' to CMD flags to remove this warning.
----------------------

Why i'm getting this message now?

Never got it before.

πŸ’‘UPDATE:

I notice it happens when i use the LORA

ZeroCool22 commented 2 weeks ago

Screenshot_4

linkpharm commented 1 week ago

I answered this on reddit. Would close but I can't.

ZeroCool22 commented 1 week ago

I answered this on reddit. Would close but I can't.

But that is true, it could cause degradation i mean, the GPU is really in risk?

HMRMike commented 1 week ago

I answered this on reddit. Would close but I can't.

But that is true, it could cause degradation i mean, the GPU is really in risk?

I think this might be something lost in translation or some meaning that's lost on us plebs. The memory will be filled completely, but the GPU will not do any different work. If anything it'll do less work because it can't be "fed" as fast due to waiting for slower memory. Honestly with how much I abuse my GPU with just regular generations, this issue is not supposed to cause any worry, I don't see how it could be worse or cause any more "degradation" than usual work.

linkpharm commented 1 week ago

I answered this on reddit. Would close but I can't.

But that is true, it could cause degradation i mean, the GPU is really in risk?

not much you can do with normal use to break a GPU. Even overclocking excessively probably is fine.

MichaelData commented 1 week ago

I answered this on reddit. Would close but I can't.

can you link your answer ?

linkpharm commented 1 week ago

I answered this on reddit. Would close but I can't.

can you link your answer ?

Can't find the comment but it's the same as I just said again.

Edit: same as I said here

MichaelData commented 1 week ago

I answered this on reddit. Would close but I can't.

can you link your answer ?

Can't find the comment but it's the same as I just said again.

Edit: same as I said here

Where did you say it I can't find it