Open pondloso opened 1 month ago
Currently the img2vid and pose models are different ones, and can't be used together. Possibly in the future there can be a pose model that's trained with image input, for now it doesn't seem possible with the models we have.
You can encode input video in addition to the pose input though, then it acts like vid2vid, but with additional pose conditioning.
"You can encode input video in addition to the pose input though, then it acts like vid2vid, but with additional pose conditioning."
can you make sample work flow about this please? i try to put in video into sample but i always got this error "CogVideoXFunControlSampler The size of tensor a (34) must match the size of tensor b (44) at non-singleton dimension 4"
I was getting the same error - it is because the video resolution has to be a matching size and when it get's resized, it is often rounded to the wrong dimension. I'm not exactly sure what the math is behind this, but I resized the source video to make sure it was the same as the width/height outputs from the CogVideo Control ImageEncode
cogfun-pose is so powerful if we can just put in for start_img like i2v it will be game changer because normal 12v it so random i had to gen like 20 for only 1 use.
but today i try cogfun-pose with new update ,video out put really come out stick with pose video with little error much more better than i expect.