Spreadsheet (WIP) of supported models and their supported features: https://docs.google.com/spreadsheets/d/16eA6mSL8XkTcu9fSWkPSHfRIqyAKJbR1O99xnuGdCKY/edit?usp=sharing
This is big one, and unfortunately to do the necessary cleanup and refactoring this will break every old workflow as they are. I apologize for the inconvenience, if I don't do this now I'll keep making it worse until maintaining becomes too much of a chore, so from my pov there was no choice.
Please either use the new workflows or fix the nodes in your old ones before posting issue reports!
Old version will be kept in a legacy branch, but not maintained
Support CogVideoX 1.5 models
Major code cleanup (it was bad, still isn't great, wip)
Merge Fun -model functionality into main pipeline:
Remove width/height from the sampler widgets and detect from input instead, this meanst text2vid now requires using empty latents
Separate VAE from the model, allow using fp32 VAE
Add ability to load some of the non-GGUF models as single files (only few available for now: https://huggingface.co/Kijai/CogVideoX-comfy)
Add some torchao quantizations as options
Add interpolation as option for the main encode node, old interpolation specific node is gone
torch.compile optimizations
Remove PAB in favor of FasterCache and cleaner code
other smaller things I forgot about at this point
For Fun -model based workflows it's more drastic change, for others migrating generally means re-setting many of the nodes.
Initial support for Tora (https://github.com/alibaba/Tora)
Converted model (included in the autodownload node):
https://huggingface.co/Kijai/CogVideoX-5b-Tora/tree/main
https://github.com/user-attachments/assets/d5334237-03dc-48f5-8bec-3ae5998660c6
This week there's been some bigger updates that will most likely affect some old workflows, sampler node especially probably need to be refreshed (re-created) if it errors out!
New features:
https://github.com/user-attachments/assets/ddeb8f38-a647-42b3-a4b1-c6936f961deb
https://github.com/user-attachments/assets/c78b2832-9571-4941-8c97-fbcc1a4cc23d
https://github.com/user-attachments/assets/d9ed98b1-f917-432b-a16e-e01e87efb1f9
Initial support for the official I2V version of CogVideoX: https://huggingface.co/THUDM/CogVideoX-5b-I2V
Also needs diffusers 0.30.3
https://github.com/user-attachments/assets/c672d0af-a676-495d-a42c-7e3dd802b4b0
Added initial support for CogVideoX-Fun: https://github.com/aigc-apps/CogVideoX-Fun
Note that while this one can do image2vid, this is NOT the official I2V model yet, though it should also be released very soon.
https://github.com/user-attachments/assets/68f9ed16-ee53-4955-b931-1799461ac561
Added experimental support for onediff, this reduced sampling time by ~40% for me, reaching 4.23 s/it on 4090 with 49 frames. This requires using Linux, torch 2.4.0, onediff and nexfort installation:
pip install --pre onediff onediffx
pip install nexfort
First run will take around 5 mins for the compilation.
5b model is now also supported for basic text2vid: https://huggingface.co/THUDM/CogVideoX-5b
It is also autodownloaded to ComfyUI/models/CogVideo/CogVideoX-5b
, text encoder is not needed as we use the ComfyUI T5.
https://github.com/user-attachments/assets/991205cc-826e-4f93-831a-c10441f0f2ce
Requires diffusers 0.30.1 (this is specified in requirements.txt)
Uses same T5 model than SD3 and Flux, fp8 works fine too. Memory requirements depend mostly on the video length. VAE decoding seems to be the only big that takes a lot of VRAM when everything is offloaded, peaks at around 13-14GB momentarily at that stage. Sampling itself takes only maybe 5-6GB.
Hacked in img2img to attempt vid2vid workflow, works interestingly with some inputs, highly experimental.
https://github.com/user-attachments/assets/e6951ef4-ea7a-4752-94f6-cf24f2503d83
https://github.com/user-attachments/assets/9e41f37b-2bb3-411c-81fa-e91b80da2559
Also added temporal tiling as means of generating endless videos:
https://github.com/kijai/ComfyUI-CogVideoXWrapper
https://github.com/user-attachments/assets/ecdac8b8-d434-48b6-abd6-90755b6b552d
Original repo: https://github.com/THUDM/CogVideo
CogVideoX-Fun: https://github.com/aigc-apps/CogVideoX-Fun
Controlnet: https://github.com/TheDenk/cogvideox-controlnet