Comfy crashes on stage load checkpoint no error msg - Githubissues

kijai / ComfyUI-SUPIR

SUPIR upscaling wrapper for ComfyUI

Other

1.2k stars 65 forks source link

Comfy crashes on stage load checkpoint no error msg #96

Closed zappazack closed 2 months ago

zappazack commented 3 months ago

Comfy crashes/stops by load checkpoint. Any idea? GPU is 4070 12GB Clean installation leads to got prompt [rgthree] Using rgthree's optimized recursive execution. [rgthree] First run patching recursive_output_delete_if_changed and recursive_will_execute. [rgthree] Note: If execution seems broken due to forward ComfyUI changes, you can disable the optimization from rgthree settings in ComfyUI. model_type EPS Using pytorch attention in VAE Using pytorch attention in VAE clip missing: ['clip_l.logit_scale', 'clip_l.transformer.text_projection.weight', 'clip_g.logit_scale'] Diffusion using fp16 making attention of type 'vanilla' with 512 in_channels Working with z of shape (1, 4, 32, 32) = 4096 dimensions. making attention of type 'vanilla' with 512 in_channels

then comfy crashes at SUPIR checkpoint stage. I guess this upscaler runs only well on 4090 or A100+ GPUs sadly cant effort this now.

kijai commented 3 months ago

I've run this with 10GB 3080 so it's not that. Crash at that stage without errors usually is about system RAM, how much do you have available? If you monitor it during loading, does it get full?

zappazack commented 3 months ago

I've run this with 10GB 3080 so it's not that. Crash at that stage without errors usually is about system RAM, how much do you have available? If you monitor it during loading, does it get full?

Thx for answering. I have monitored the process GPU and CPU, RAM: GPU:4%, CPU 7%, RAM: 50%max from 32GB. Workflow stops at 17% SUPIR model loader v2. Is this maybe the cause: [ComfyUI-Manager] skip black listed pip installation: 'transformers' Install: pip packages for 'C:\Users\Intersurf\Documents\ComfyUI\custom_nodes\wlsh_nodes' Install: pip packages for 'C:\Users\Intersurf\Documents\ComfyUI\custom_nodes\rgthree-comfy' Install: pip packages for 'C:\Users\Intersurf\Documents\ComfyUI\custom_nodes\ComfyUI-KJNodes' Install: pip packages for 'C:\Users\Intersurf\Documents\ComfyUI\custom_nodes\ComfyUI-SUPIR'

Update: I tried another task: 1st start running the workflow and while running starting the task manager. Result: This runs into a RAM error. Machine: 5900x 32GB RAM, GPU 4070 12GB. Any advice how to fix?

kijai commented 3 months ago

Transformers install shouldn't be an issue. Are you using the v2 model loader? And this node pack is for sure in the latest version? I'm asking as this was an issue I recently fixed, so I don't understand why you are running into it with 32GB RAM now...

zappazack commented 2 months ago

Transformers install shouldn't be an issue. Are you using the v2 model loader? And this node pack is for sure in the latest version? I'm asking as this was an issue I recently fixed, so I don't understand why you are running into it with 32GB RAM now...

Sadly same issue after reinstalling SUPIR through git clone. Could be the python version an issue; 11.7?

zappazack commented 2 months ago

Transformers install shouldn't be an issue. Are you using the v2 model loader? And this node pack is for sure in the latest version? I'm asking as this was an issue I recently fixed, so I don't understand why you are running into it with 32GB RAM now...

Is there any way to get the Supir Checkpoints in a smaller size like 2-3GB? I have also tested the old legacy node so its definetely a problem with loading these huge checkpoints because comfy breaks on the stage loading the Supir Checkpoints. As for Stable Cascade there was a similar problem but the checkpoints extra suited for comfy are working fine.

kijai commented 2 months ago

I don't know if it will help, but it's a good idea anyway, if only to save disk space, so I did just that now: https://huggingface.co/Kijai/SUPIR_pruned/tree/main

I've seen people with lower specs than yours use this, myself included, so I still don't understand the issue though. But try this and see if it helps at all.

zappazack commented 2 months ago

I don't know if it will help, but it's a good idea anyway, if only to save disk space, so I did just that now: https://huggingface.co/Kijai/SUPIR_pruned/tree/main

I've seen people with lower specs than yours use this, myself included, so I still don't understand the issue though. But try this and see if it helps at all.

Thx I am wondering also. I have tested the workflow with parameter --highvram to get more info about checkpoint loading procedure and received the following error

Error occurred when executing CheckpointLoaderSimple:

Allocation on device 0 would exceed allowed memory. (out of memory) Currently allocated : 6.90 GiB Requested : 3.12 MiB Device limit : 11.99 GiB Free (according to CUDA): 3.89 GiB PyTorch limit (set by user-supplied memory fraction) : 17179869184.00 GiB

zappazack commented 2 months ago

I don't know if it will help, but it's a good idea anyway, if only to save disk space, so I did just that now: https://huggingface.co/Kijai/SUPIR_pruned/tree/main

I've seen people with lower specs than yours use this, myself included, so I still don't understand the issue though. But try this and see if it helps at all.

running workflow as admin, Supir fp16, --lowvram parameter all didnt help. I figured out that the error seems to be "python.exe....cant read memory" it always stops at Supir node-V2.

zappazack commented 2 months ago

I've run this with 10GB 3080 so it's not that. Crash at that stage without errors usually is about system RAM, how much do you have available? If you monitor it during loading, does it get full?

What are detailed specs, python version, virtual ram etc.?

kijai commented 2 months ago

I've run this with 10GB 3080 so it's not that. Crash at that stage without errors usually is about system RAM, how much do you have available? If you monitor it during loading, does it get full?

What are detailed specs, python version, virtual ram etc.?

Windows, Python 3.10, pytorch 2.2.1 + cu121. Tried various configurations ranging from 10-24GB VRAM and 32GB-64GB RAM.

zappazack commented 2 months ago

I've run this with 10GB 3080 so it's not that. Crash at that stage without errors usually is about system RAM, how much do you have available? If you monitor it during loading, does it get full?

What are detailed specs, python version, virtual ram etc.?

Windows, Python 3.10, pytorch 2.2.1 + cu121. Tried various configurations ranging from 10-24GB VRAM and 32GB-64GB RAM.

The python.exe error was the correct path to have a closer look at. I have had installed python3.11.7 this was the issue. Under python 3.10.11 everything works fine. Upscale procedure 512x512 to 1536X1536 takes 75.1 secs with fp16 Supir checkpoint and 84.90 with the fp32. From the 1st impression results are amazing/flashing I will compare it to tile and multidiffusion upscale on auto1111. Thx again for your help!