kijai / ComfyUI-CogVideoXWrapper

1.02k stars 61 forks source link

Hello, could you please tell me how I can run the Fun Model with Tora? #198

Closed leetraman822 closed 3 weeks ago

leetraman822 commented 3 weeks ago

I noticed that the latest update supports the Tora model, but I can't figure out how to make it work.

4lt3r3go commented 3 weeks ago

i don't understand either the Tora Xfun support. TORA doesnt work with Xfun models, only 5B_I2V. Maybe i'm missing a TORA XFUN model released somewere that i'm not aware about. sooo much confusion with all this Cogs 😂

kijai commented 3 weeks ago

It does work, it's just clumsy to use because the Fun sampler resizes the input to the closest bucket resolution (which is something I want to get rid of, but that's more work to do properly than I have time currently), you can see what that is in the log, and also with this node:

image

All the resolutions and frame counts need to match for it to work.

https://github.com/user-attachments/assets/0d7c5fda-7874-4353-9fc7-b04dbdf2e713

It can even be used with vid2vid as extra guidance:

https://github.com/user-attachments/assets/4d91fa41-0f33-45fd-a19a-da017b7215dd

4lt3r3go commented 3 weeks ago

spent half day, i'm unable to make it work even with the "closest bucket" node.

got this:

Error while processing rearrange-reduction pattern "B (T H W) C -> (B T) C H W".
Input tensor shape: torch.Size([2, 3724, 3072]). Additional info: {'H': 52, 'W': 78}.
Shape mismatch, can't divide axis of length 3724 in chunks of 4056

i'm using alibaba-pai cogvideoXfun 1.1 5b Inp , trying ake this work in image to video

kijai commented 3 weeks ago

spent half day, i'm unable to make it work even with the "closest bucket" node.

got this:

Error while processing rearrange-reduction pattern "B (T H W) C -> (B T) C H W".
Input tensor shape: torch.Size([2, 3724, 3072]). Additional info: {'H': 52, 'W': 78}.
Shape mismatch, can't divide axis of length 3724 in chunks of 4056

i'm using alibaba-pai cogvideoXfun 1.1 5b Inp , trying ake this work in image to video

Some dimension do not match, try with a simple 512x512 input because that will not get resized. The spline editor has to be set to same size, along with everything else.

4lt3r3go commented 3 weeks ago

thanks for your time and your incredible work on all this. I got it to work finally. any plans on 2B tora? also, is there a way to add more splines to move different objects in different direction? i saw some examples of tora around doing that, can't find a way to do this

kijai commented 3 weeks ago

thanks for your time and your incredible work on all this. I got it to work finally. any plans on 2B tora? also, is there a way to add more splines to move different objects in different direction? i saw some examples of tora around doing that, can't find a way to do this

It doesn't seem to work with the 2b models because the embed sizes don't match, I don't know if it can work, probably not.

For multiple trajectories you can use multiple spline editors and join the coordinates with this (node is in KJNodes):

image

Ratinod commented 3 weeks ago

It doesn't seem to work with the 2b models because the embed sizes don't match, I don't know if it can work, probably not.

It's very sad... I really wanted it to work together with the "NimVideo/cogvideox-2b-img2vid" model, because in my opinion this is the best ratio of speed, quality and low VRAM consumption for local i2v generation right now. And "Tora" would make it even better.

4lt3r3go commented 3 weeks ago

It doesn't seem to work with the 2b models because the embed sizes don't match, I don't know if it can work, probably not.

It's very sad... I really wanted it to work together with the "NimVideo/cogvideox-2b-img2vid" model, because in my opinion this is the best ratio of speed, quality and low VRAM consumption for local i2v generation right now. And "Tora" would make it even better.

i'm honestly way more interested in making this works with xfun 2b model out than that NimVideo, wich is an incredible model don't get me wrong... but the ability to pick a "start" and "end" image in Xfun provide way more possibilities. this workflow here for example use Xfun i2v and i love it.

kijai commented 3 weeks ago

It's just not going to work with 2b, it would require training new Tora model for 2b.

The GGUF version of 5b doesn't use that much memory and it can work with really small resolutions, smaller than what 2b can do. At 512x512 the 5b model with Tora fits under 12GB.

4lt3r3go commented 2 weeks ago

It's just not going to work with 2b, it would require training new Tora model for 2b.

The GGUF version of 5b doesn't use that much memory and it can work with really small resolutions, smaller than what 2b can do. At 512x512 the 5b model with Tora fits under 12GB.

It’s not a matter of memory🙂 (i'm on 3090) it’s that the new 2B model has been released and claims to be as good as version 5B while also being much faster.. is not really like that in my opinion but is really impressive what 2B can do now. Additionally, there are some workflows for creating animations that are based on the FunX 2B as a strength for speed. Despite being less accurate, 2b’s high speed often makes it very useful for some tasks. Personally, if I had a Tora that works with 2B, I’d probably be the happiest person in the world. A bunch of nvidia H100 would help the happiness too 😁

kijai commented 2 weeks ago

It's just not going to work with 2b, it would require training new Tora model for 2b. The GGUF version of 5b doesn't use that much memory and it can work with really small resolutions, smaller than what 2b can do. At 512x512 the 5b model with Tora fits under 12GB.

It’s not a matter of memory🙂 (i'm on 3090) it’s that the new 2B model has been released and claims to be as good as version 5B while also being much faster.. is not really like that in my opinion but is really impressive what 2B can do now. Additionally, there are some workflows for creating animations that are based on the FunX 2B as a strength for speed. Despite being less accurate, 2b’s high speed often makes it very useful for some tasks. Personally, if I had a Tora that works with 2B, I’d probably be the happiest person in the world. A bunch of nvidia H100 would help the happiness too 😁

In my experience the 5B I2V is still lot better quality. Also the Fun- models are more interesting because the resolution and frame count is not locked like the I2V models are.

Anyway it's just not going to work without re-training, nothing I can do about Tora + 2B.

4lt3r3go commented 2 weeks ago

yes agree 100%, 5B is superior, no doubts 🙂 but If only God existed, he’d probably know how many tests I’ve done on all COG models. I've found plenty of useful applications for the 2B models specifically because of their speed so let's hope they release TORA 2b at some point.

I know it’s hard with all the amazing work you’re doing, but whenever you have a couple of minutes, I’d love to hear your opinion on this workflow, but maybe that’s asking too much😁 https://civitai.com/models/866210?modelVersionId=1010923