Open PaulFidika opened 10 months ago
Ah, can you lmk the model checkpoint you're testing, and how many frames you're putting as input?
try to decrease decoder frames
(1) the error is happening in the sampler, rather than the decoder. Reducing decoder's decoder_t param to 1 fixed the decoder running out of memory, but the sampler is still running out of memory
(2) I'm running 25-frames with 36 steps for the svd_xt model.
if I reduce the number of frames, I'm able to get past and generate a successful video. But if I wnat 25-frames, I still get the error:
Allocation on device 0 would exceed allowed memory. (out of memory)
Currently allocated : 18.31 GiB
Requested : 1.92 GiB
Device limit : 23.65 GiB
Free (according to CUDA): 33.19 MiB
PyTorch limit (set by user-supplied memory fraction)
: 17179869184.00 GiB
what confuses me is that it says I have 24GBs, it wants 2 GBs, on top of 18GBs it already has, and it says it's out of memory? Why? I should have enough VRAM for this.
Is it possible that between generations, VRAM is being left over and not freed up after the job is completed? On the lesser frame requirements, I do get the generation to run, but then it won't work a second time.
I have exactly the same issue, if you find a fix let me know please !
I'm facing the same problem! No matter which parameters I set, even 1 frame per second and a single step, I get an error:
'Allocated 16.67 GB, need an additional 1.92 GB, 23.69 GB available - not enough memory.'
I'm using Ubuntu 23.10, with 64GB RAM and an RTX3090 with 24GB. It's so frustrating...
Update! The issue with insufficient memory was resolved by disabling the FreeU Advanced node.
However, this led to a new problem, an xformer error:
Error occurred when executing SVDSampler:
No operator found for memory_efficient_attention_forward
with inputs:
query : shape=(1, 64512, 1, 512) (torch.float32)
key : shape=(1, 64512, 1, 512) (torch.float32)
value : shape=(1, 64512, 1, 512) (torch.float32)
attn_bias :
p : 0.0
decoderF
is not supported because:
max(query.shape[-1] != value.shape[-1]) > 128
xFormers wasn't build with CUDA support
attn_bias type is
operator wasn't built - see python -m xformers.info
for more info
flshattF@0.0.0
is not supported because:
max(query.shape[-1] != value.shape[-1]) > 256
xFormers wasn't build with CUDA support
dtype=torch.float32 (supported: {torch.float16, torch.bfloat16})
operator wasn't built - see python -m xformers.info
for more info
tritonflashattF
is not supported because:
max(query.shape[-1] != value.shape[-1]) > 128
xFormers wasn't build with CUDA support
dtype=torch.float32 (supported: {torch.float16, torch.bfloat16})
operator wasn't built - see python -m xformers.info
for more info
triton is not available
cutlassF
is not supported because:
xFormers wasn't build with CUDA support
operator wasn't built - see python -m xformers.info
for more info
smallkF
is not supported because:
max(query.shape[-1] != value.shape[-1]) > 32
xFormers wasn't build with CUDA support
operator wasn't built - see python -m xformers.info
for more info
unsupported embed per head: 512
By the way, I tried using the simplified svd-fp16 models, but I received a response that the configuration lacks a yaml file for this model. It would be good to add it, as it would reduce the model size by half.
I have same issue. Use 4090 24Gb. It consumes 15 GB only but inform Out of Memory to use. I use the default setting of SVD: 14 frames and fps of video is 8 even 6 only
I update more screenshot for this issue
Thought I would drop in here to say that I have everything up and running. I am actually using the automatic 1111 extension for Comfy UI (I run CUI in my A1111). Everything works well except I too am running into CUDA out of memory issues. I am utilizing an NVIDIA L40 with 48 GB VRAM too, so I should have plenty for the job at least I thought so.
I also am seeing this issue, on a 3090 w 24Gb ram.
torch.cuda.OutOfMemoryError: Allocation on device 0 would exceed allowed memory. (out of memory)
Currently allocated : 18.23 GiB
Requested : 1.09 GiB
Device limit : 23.48 GiB
Free (according to CUDA): 33.00 MiB
PyTorch limit (set by user-supplied memory fraction)
: 17179869184.00 GiB
I note that there was a similar bug in ComfyUI reported, but apparently the bug has been fixed. Perhaps might be a clue here: https://github.com/comfyanonymous/ComfyUI/issues/1918#issuecomment-1806871089
@thecooltechguy @PaulFidika @ALL
(1) the error is happening in the sampler, rather than the decoder. Reducing decoder's decoder_t param to 1 fixed the decoder running out of memory, but the sampler is still running out of memory
I scan throught this repo code real fast, and find many cases just assume using CUDA devices (e.g. torch.autocast('cuda')), so this repo so far would not respect accerate offloading or low vram, no vram, even cpu only. Just CUDA at this moment!
(1) Sampler out of memory: Only Model size and image size matters, so either waiting for smarker people to create better quantization model or try to scale down your image with "UpScaleImageBy" node with scale < 1.0. Adjust scale until your are on longer OOM!
(2) Decoder out of memory : the author has put a obfucated parameter called decoding_t, under the hood, in python it is called en_and_decode_n_samples_a_time, which divides your decoding into frame / decoding_t phases, each phase running a small batch and finally concate these batches together
So you would set decoding_t from [1, frames], try find a number that you would not OOM
you're talking about video, but I can't zoom picture in ComfyUI and I also get this error.
torch.cuda.OutOfMemoryError: Allocation on device 0 would exceed allowed memory. (out of memory) Currently allocated : 6.98 GiB Requested : 50.00 MiB Device limit : 11.00 GiB
try to decrease decoder frames
This doesnt do anything to vram requirement.
Lowering decoding_t
to 2 fixed it for me per @yhyu13 comment.
The SVD sampler is giving the error:
I'm confused; I have 24GBs of VRAM, and yet it's saying that is' running out of VRAM beyond 16GBs. Am I doing something wrong? This is the only process running on my GPU.