vladmandic / automatic

SD.Next: Advanced Implementation of Stable Diffusion and other Diffusion-based generative image models
https://github.com/vladmandic/automatic
GNU Affero General Public License v3.0
5.36k stars 382 forks source link

[Issue]: OOM at 2nd gen with second pass(hires) #1794

Closed Yoinky3000 closed 1 year ago

Yoinky3000 commented 1 year ago

Issue Description

It always throw OOM error at the 2nd generation that with second pass, it works fine for 1st generation with second pass/any generation without second pass

I have 8gb dedicated vram and 16gb share vram for the 1st gen with second pass, it used all Dvram and 3gb Svram when switching to second pass, and it used all Dvram and 10gb Svram during output second pass img for the 2nd gen with second pass, it used all Dvram and 10gb Svram when switching, and it used all Dvram and 14.3gb Svram before it throw OOM during output

The extension it installed are all by default and i didnt use any of them to generate img

Version Platform Description

13:08:28-714292 INFO Starting SD.Next 13:08:28-721854 INFO Python 3.10.11 on Windows 13:08:28-754906 INFO Version: 511a8cbb Fri Jul 21 15:04:27 2023 -0400 13:08:29-281182 INFO nVidia CUDA toolkit detected 13:08:29-406125 INFO Verifying requirements 13:08:29-430207 INFO Verifying packages 13:08:29-432755 INFO Verifying repositories 13:08:33-853746 INFO Verifying submodules 13:09:21-188882 INFO Extensions enabled: ['a1111-sd-webui-lycoris', 'clip-interrogator-ext', 'LDSR', 'Lora', 'multidiffusion-upscaler-for-automatic1111', 'ScuNET', 'sd-dynamic-thresholding', 'sd-extension-system-info', 'sd-webui-agent-scheduler', 'sd-webui-controlnet', 'stable-diffusion-webui-images-browser', 'stable-diffusion-webui-rembg', 'SwinIR'] 13:09:21-189780 INFO Verifying packages 13:09:21-193352 INFO Updating Wiki 13:09:22-143088 INFO Extension preload: 0.0s C:\Vlad-SD\extensions-builtin 13:09:22-145088 INFO Extension preload: 0.0s C:\Vlad-SD\extensions 13:09:22-176777 INFO Server arguments: ['--insecure', '--medvram', '--upgrade', '--disable-console-progressbars', '--listen'] No module 'xformers'. Proceeding without it. 13:09:30-854304 INFO Pipeline: Backend.ORIGINAL 13:09:32-567832 INFO Libraries loaded

OS: Windows 11 Home x86_64 Browser: MS Edge

Relevant log output

No response

Acknowledgements

Yoinky3000 commented 1 year ago

the error log:

14:42:40-799578 ERROR    gradio call: RuntimeError
╭──────────────────────────── Traceback (most recent call last) ─────────────────────────────╮
│ C:\Vlad-SD\modules\call_queue.py:34 in f                                                   │
│                                                                                            │
│    33 │   │   │   try:                                                                     │
│ ❱  34 │   │   │   │   res = func(*args, **kwargs)                                          │
│    35 │   │   │   │   progress.record_results(id_task, res)                                │
│                                                                                            │
│ C:\Vlad-SD\modules\txt2img.py:65 in txt2img                                                │
│                                                                                            │
│   64 │   if processed is None:                                                             │
│ ❱ 65 │   │   processed = processing.process_images(p)                                      │
│   66 │   p.close()                                                                         │
│                                                                                            │
│                                  ... 13 frames hidden ...                                  │
│                                                                                            │
│ C:\Vlad-SD\venv\lib\site-packages\torch\nn\modules\module.py:1501 in _call_impl            │
│                                                                                            │
│   1500 │   │   │   │   or _global_forward_hooks or _global_forward_pre_hooks):             │
│ ❱ 1501 │   │   │   return forward_call(*args, **kwargs)                                    │
│   1502 │   │   # Do not call functions when jit is used                                    │
│                                                                                            │
│ C:\Vlad-SD\repositories\stable-diffusion-stability-ai\ldm\modules\diffusionmodules\model.p │
│ y:149 in forward                                                                           │
│                                                                                            │
│   148 │   │                                                                                │
│ ❱ 149 │   │   return x+h                                                                   │
│   150                                                                                      │
╰────────────────────────────────────────────────────────────────────────────────────────────╯
RuntimeError: CUDA error: out of memory
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace 
below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

Traceback (most recent call last):
  File "C:\Vlad-SD\venv\lib\site-packages\gradio\routes.py", line 422, in run_predict
    output = await app.get_blocks().process_api(
  File "C:\Vlad-SD\venv\lib\site-packages\gradio\blocks.py", line 1323, in process_api        
    result = await self.call_function(
  File "C:\Vlad-SD\venv\lib\site-packages\gradio\blocks.py", line 1051, in call_function      
    prediction = await anyio.to_thread.run_sync(
  File "C:\Vlad-SD\venv\lib\site-packages\anyio\to_thread.py", line 33, in run_sync
    return await get_asynclib().run_sync_in_worker_thread(
  File "C:\Vlad-SD\venv\lib\site-packages\anyio\_backends\_asyncio.py", line 877, in run_sync_in_worker_thread
    return await future
  File "C:\Vlad-SD\venv\lib\site-packages\anyio\_backends\_asyncio.py", line 807, in run      
    result = context.run(func, *args)
  File "C:\Vlad-SD\modules\call_queue.py", line 90, in f
    mem_stats = {k: -(v//-(1024*1024)) for k, v in shared.mem_mon.stop().items()}
  File "C:\Vlad-SD\modules\memmon.py", line 73, in stop
    return self.read()
  File "C:\Vlad-SD\modules\memmon.py", line 57, in read
    free, total = self.cuda_mem_get_info()
  File "C:\Vlad-SD\modules\memmon.py", line 34, in cuda_mem_get_info
    return torch.cuda.mem_get_info(index)
  File "C:\Vlad-SD\venv\lib\site-packages\torch\cuda\memory.py", line 618, in mem_get_info    
    return torch.cuda.cudart().cudaMemGetInfo(device)
RuntimeError: CUDA error: out of memory
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
vladmandic commented 1 year ago

which gpu? what is your original resolution? what is target resolution set in hires fix?

Yoinky3000 commented 1 year ago

3070ti laptop, 576x1024 for base res, and hires 2.5x to 1440x2560

vladmandic commented 1 year ago

well, that is very high resolution and you're running out of vram. i'm really not sure what is the issue here?

Yoinky3000 commented 1 year ago

it works in auto1111 tho

Yoinky3000 commented 1 year ago

it pass with full Dvram usage and 7gb Svram usage in auto1111

vladmandic commented 1 year ago

and your settings are identical? i doubt it since a1111 and sdnext are quite different. in either case, if something doesn't fit in vram, behavior is "whatever happens, happens". expectation that something will work using shared ram is bad since that changes frequently and depends on many factors.

Yoinky3000 commented 1 year ago

im not sure if its totally identical, but most main settings are identical

and I dont think its about if something fit in vram or not, the problem i have is it works for the first generation that with second pass, which means everything should be able to works properly but when i trying to do another generation with second pass, it vram usage is insanely high and oom for some reason

vladmandic commented 1 year ago

what is "insanely high"

Yoinky3000 commented 1 year ago

as i mentioned in the first message, it use all share vram for no reason, while normally 2560x1440 can be handled by only using around 10gb share vram