Open hotpot-killer opened 1 month ago
enable_vae_tiling in CogVideoDecode node
enable_vae_tiling in CogVideoDecode node
When I turned on enable_cae_tiling in the CogVideoDecoder node, it still displayed oom
16 gb i use all of it just fine with vae_tiling . it even use less vram with gguf but take a little more time.
It sits at 28GB in vram for me when working across a proper 1024x1024 image. BUT system ram and vram spikes go into the 42-61GB range before settling down while loading/swapping as an fyi depending on what you are doing when running (high quality for example).
I'm on a L40 GPU with 48GB vram + 78GB system ram with a swap file configured/enabled to enable ram sharing (giving me about 68GBvram "shared" with the swap - so watching it in the task manager I can see it jumping and moving things around between all the memory spaces and luckily avoid an OOM when running.
On my A6000 gpu server with 48gb sys ram the exact same prompt and settings will OOM when trying to finalize at the end. Probably from loading the vae but anyways.
This all depends on what size you are running for the image. Hope that helps a bit when you "allow" this thing fully off the leash. (None of the 4 settings enabled like tiling, vae, etc) to let it just run free and use what it wants to fully -just work- if you get what I mean. Certain ai models just flow better without any restrictions. BUT if you don't have large swap or vram you'll almost always have to enable the 1 vae tiling setting.
Note: on the newest nvidia drivers and experimental diffusers/torch2.4.1 + a gpu that allows you to enable "fastmode" you can shave 2-3min off the entire gen time!
2080Ti 22G,I use 720*480 image input.VRAM use more than 22340MB.
I have a V100 with 32GB of GPU memory, and when I decode it, it shows oom