Closed cdb-boop closed 9 months ago
I'm not 100% sure but I think i read somewhere that it only works with proper SDXL checkpoints, no Turbo, no Lightning. Can someone confirm that? And is RealvisXL 3 of that kind?
Have you disabled system memory fallback from nvidia drivers? I'm able to do 512 -> 1024 with my 10GB 3080, it's slow (2 mins) and uses system memory when it peaks, but it works. You can also try with reducing the tiled vae size.
I'm not 100% sure but I think i read somewhere that it only works with proper SDXL checkpoints, no Turbo, no Lightning. Can someone confirm that? And is RealvisXL 3 of that kind?
Pretty sure the scheduler used only works with normal SDXL models.
I'm not 100% sure but I think i read somewhere that it only works with proper SDXL checkpoints, no Turbo, no Lightning. Can someone confirm that? And is RealvisXL 3 of that kind?
I just tried the base model and the same problem.
Have you disabled system memory fallback from nvidia drivers? I'm able to do 512 -> 1024 with my 10GB 3080, it's slow (2 mins) and uses system memory when it peaks, but it works. You can also try with reducing the tiled vae size.
Yeah, I had disabled fallback. I'll need to investigate more.
I had the same problem in my 3060 with 12g ram
1150, in convert return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking) torch.cuda.OutOfMemoryError: Allocation on device 0 would exceed allowed memory. (out of memory) Currently allocated : 11.31 GiB Requested : 6.25 MiB Device limit : 12.00 GiB Free (according to CUDA): 0 bytes PyTorch limit (set by user-supplied memory fraction) : 17179869184.00 GiB
I disabled system memory fallback and it works
I'm not 100% sure but I think i read somewhere that it only works with proper SDXL checkpoints, no Turbo, no Lightning. Can someone confirm that? And is RealvisXL 3 of that kind?
I just tried the base model and the same problem.
Have you disabled system memory fallback from nvidia drivers? I'm able to do 512 -> 1024 with my 10GB 3080, it's slow (2 mins) and uses system memory when it peaks, but it works. You can also try with reducing the tiled vae size.
Yeah, I had disabled fallback. I'll need to investigate more.
I found one bug that caused a memory spike after model load, could explain this too, it's fixed now. With system memory fallback it wasn't an issue so I didn't notice it at first.
With commit c74b8248a73352dc5bdc99496006e96321738f38
, it seems to be working now. I'm seeing a peak VRAM usage of 10427MiB and idle of 9339MiB with a 512x512 to 512x512 pass and the default settings.
Hey guys, somehow i am unable to get this working on my 4060ti with 16gb vram. I keep getting:
ERROR:root:!!! Exception during processing !!!
ERROR:root:Traceback (most recent call last):
File "/config/05-comfy-ui/ComfyUI/execution.py", line 152, in recursive_execute
output_data, output_ui = get_output_data(obj, input_data_all)
File "/config/05-comfy-ui/ComfyUI/execution.py", line 82, in get_output_data
return_values = map_node_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True)
File "/config/05-comfy-ui/ComfyUI/execution.py", line 75, in map_node_over_list
results.append(getattr(obj, func)(**slice_dict(input_data_all, i)))
File "/config/05-comfy-ui/ComfyUI/custom_nodes/ComfyUI-SUPIR/nodes.py", line 212, in process
self.model.init_tile_vae(encoder_tile_size=encoder_tile_size_pixels, decoder_tile_size=decoder_tile_size_latent)
AttributeError: 'SUPIR_Upscale' object has no attribute 'model'
i thought i´d ask here first, before opening a new issue. But it seems like people got it to work with 12gb of vram, i have 16 and i tried even with super low resolutions like 265x265
@Joly0 Always fine to ask before opening a new issue. :)
Anyways, I don't see a memory issue in the output you've showed. It looks like the model
isn't getting initialized. I'd suggest double checking you downloaded and added the model weights correctly and afterwards opening a new issue including all relevant debug outputs.
It was said in the original repo, and you also thought it was the case, that it is possible to get running within 12GB VRAM, but I just can't get it to work with this wrapper.
I downloaded all the models, resolved issues with xformers and pytorch+cuda, used an integrated GPU for my display, fiddled with the settings (
use_tiled_vae
,diffusion_dtype
andencoder_dtype
) and input a test 512x512 image. Am I missing something?