s9roll7 / animatediff-cli-prompt-travel

animatediff prompt travel
Apache License 2.0
1.17k stars 105 forks source link

[Feature Request] animatediff v3 & controlnet sparsectrl #205

Open hylarucoder opened 6 months ago

hylarucoder commented 6 months ago

Loved your work! Animatediff just announced v3!

SparseCtrl allows to animate ONE keyframe, generate transition between TWO keyframes and interpolate MULTIPLE sparse keyframes. RGB images and scribbles are supported for now.

https://github.com/guoyww/AnimateDiff/tree/4825080465c14be14878ee03c27cf0268494137b?tab=readme-ov-file#202312-animatediff-v3-and-sparsectrl

Adding this feature to the CLI would be a significant enhancement.

Eye-01 commented 6 months ago

Hey. Can you please explain in what it is an improvement ? Prompt travel is already more powerful than just keyframe interpolation. You can just do anything you want, transitionning with prompt description with or without controlnet on keyframes (openpose , softedge give great results). for example

hylarucoder commented 6 months ago

@Eye-01

Certainly, cli is much powerful.

The sparse control mechanism is capable of producing an effect similar to SVD. you read more example follow link above. more controlnet methods mean less hacking

Can you achieve same effect without sparsectrl?

hylarucoder commented 6 months ago

let's image a user case.

If you create a 3-minute video and there are some small segments that you are not satisfied with, but you are pleased with the rest, the "keyframe interpolation" feature allows you to lock in keyframes to the left and right of the unsatisfactory segments, regenerate them, and then stitch them back in. This way, your video has much more flexibility for editing.

WyattAutomation commented 6 months ago

Loved your work! Animatediff just announced v3!

SparseCtrl allows to animate ONE keyframe, generate transition between TWO keyframes and interpolate MULTIPLE sparse keyframes. RGB images and scribbles are supported for now.

https://github.com/guoyww/AnimateDiff/tree/4825080465c14be14878ee03c27cf0268494137b?tab=readme-ov-file#202312-animatediff-v3-and-sparsectrl

Adding this feature to the CLI would be a significant enhancement.

Considering people are getting SD 1.5 generating interactive real-time video now at around 12-16 FPS (or more) via recent lcm_lora + controlnet + torch.compile() enhancements, and if I understand what SparseCtrl does:

..integrating it here (along with a torch_compile() inference option and/or other lcm/turbo optimizations) seems to imply the possibility of getting this repo generating interactively-controllable AnimateDiff output in realtime, no?.

With just the sample_lcm.json as-is, it took me ~24 seconds to generate 80 frames with 4 steps, which still looks great when using the "add_detail" lora. I can probably tweak it and get it running faster but my (potentially wrong) understanding is that with the current version of AnimateDiff here the whole video generates all frames all at once or something like that, so sequentially rendering frame-by-frame has not been an option?

That description of SparseControl sounds like it's able to seamlessly chunk AnimateDiff generation so it can happen keyframe-to-keyframe. If that's true, I could spin up one keyframe animation sequence as a single-step buffer upon launch, then instead of using static Controlnet images saved to directories, integrate an OpenPose skeleton viewport animator I created with PySide6/Panda3D. I have been dreaming of interactive/realtime control of AnimateDiff since the first time I saw it, this would be absolutely amazing as it would unlock so much capability.

..explaining that "OpenPose skeleton viewport" I'm using as a ControlNet controller a bit further (my usecase here):

I stripped toyxyz's "pose-bones that look like OpenPose skeletons" blender project (available on gumroad) down to just the Open_Pose_Body model, exported the skeleton as fbx, imported it into mixamo as a character, pulled the entire mixamo free tier fbx animations of it to local, reimported all of them back into the blender project, removed everything except one openpose model+ armiture but kept all the animations, restored the emission modifier on the OpenPose skeleton material back to how toyxyz had it prior to the mixamo fbx scrape, converted the project to .bam via Blend2Bam and it's able to be loaded into a simple mouse-navigable viewport launched from PySide6 using Panda3D (portable, no blender/blender API etc needed), with the mixamo animations baked-in and selectable for transition to in the parent PySide6 app.

The PySide6 main window just pulls a PNMImage from the user-controlled Panda3D viewport to PIL whenever it needs a ControlNet input, as it renders SD+controlnet output from it's own viewer in parallel.

I originally was just launching said viewport from a PySide6 refactor/modification that I made to "AiFartist's" 150FPS sd-turbo example: https://github.com/aifartist/ArtSpew, and getting like 17FPS realtime control of the SD output via that viewport window. However if we had some way of at least chunking AnimateDiff to keyframes even, I could just modify the prompt_map to loop indefinitely while live-editing prompts and performing the controlnet inputs and have a vaiable solution for live interaction with AnimateDiff.

..Even with a single keyframe-step of delay between a UI event and it's result showing up in the SD render viewport, having this capability would still be absolutely nuts. There are so many things I could do with this -- if the implication here is to enable sequential and seamless chunking of AnimateDiff generations, I cannot stress to you how useful this would be.

Eye-01 commented 6 months ago

@Eye-01

Certainly, cli is much powerful.

The sparse control mechanism is capable of producing an effect similar to SVD. you read more example follow link above. more controlnet methods mean less hacking

Can you achieve same effect without sparsectrl?

Thank you for your answer. Well, honestly all the examples that you refered in the QuickDemo section in your link seem possible to make with animatediff-cli-prompt-travel. Did you play enough with it ? I'm not sure. Also such add-on seem still experimental with glitches in their demo, and probably takes more RAM anyway. Unless the examples are not well chosen, the animation moves seem too simple compared to all the move complexity we can reach with prompt travel as I showed you for example in my video (or even in this video) which are generated even without any controlnet nor masks at first pass.

Concerning the editing possibilities during a video creation, if a portion of an animation is not good (it happened to me many times), you just have to change the prompt_map within the unsatisfying section and regenerate the video.

Well, anyway, it's up to s9roll7 if he wants to support this, but I won't in my fork for now.

Eye-01 commented 6 months ago

Loved your work! Animatediff just announced v3!

SparseCtrl allows to animate ONE keyframe, generate transition between TWO keyframes and interpolate MULTIPLE sparse keyframes. RGB images and scribbles are supported for now.

https://github.com/guoyww/AnimateDiff/tree/4825080465c14be14878ee03c27cf0268494137b?tab=readme-ov-file#202312-animatediff-v3-and-sparsectrl Adding this feature to the CLI would be a significant enhancement.

Considering people are getting SD 1.5 generating interactive real-time video now at around 12-16 FPS (or more) via recent lcm_lora + controlnet + torch.compile() enhancements, and if I understand what SparseCtrl does:

..integrating it here (along with a torch_compile() inference option and/or other lcm/turbo optimizations) seems to imply the possibility of getting this repo generating interactively-controllable AnimateDiff output in realtime, no?.

With just the sample_lcm.json as-is, it took me ~24 seconds to generate 80 frames with 4 steps, which still looks great when using the "add_detail" lora. I can probably tweak it and get it running faster but my (potentially wrong) understanding is that with the current version of AnimateDiff here the whole video generates all frames all at once or something like that, so sequentially rendering frame-by-frame has not been an option?

That description of SparseControl sounds like it's able to seamlessly chunk AnimateDiff generation so it can happen keyframe-to-keyframe. If that's true, I could spin up one keyframe animation sequence as a single-step buffer upon launch, then instead of using static Controlnet images saved to directories, integrate an OpenPose skeleton viewport animator I created with PySide6/Panda3D. I have been dreaming of interactive/realtime control of AnimateDiff since the first time I saw it, this would be absolutely amazing as it would unlock so much capability.

..explaining that "OpenPose skeleton viewport" I'm using as a ControlNet controller a bit further (my usecase here):

I stripped toyxyz's "pose-bones that look like OpenPose skeletons" blender project (available on gumroad) down to just the Open_Pose_Body model, exported the skeleton as fbx, imported it into mixamo as a character, pulled the entire mixamo free tier fbx animations of it to local, reimported all of them back into the blender project, removed everything except one openpose model+ armiture but kept all the animations, restored the emission modifier on the OpenPose skeleton material back to how toyxyz had it prior to the mixamo fbx scrape, converted the project to .bam via Blend2Bam and it's able to be loaded into a simple mouse-navigable viewport launched from PySide6 using Panda3D (portable, no blender/blender API etc needed), with the mixamo animations baked-in and selectable for transition to in the parent PySide6 app.

The PySide6 main window just pulls a PNMImage from the user-controlled Panda3D viewport to PIL whenever it needs a ControlNet input, as it renders SD+controlnet output from it's own viewer in parallel.

I originally was just launching said viewport from a PySide6 refactor/modification that I made to "AiFartist's" 150FPS sd-turbo example: https://github.com/aifartist/ArtSpew, and getting like 17FPS realtime control of the SD output via that viewport window. However if we had some way of at least chunking AnimateDiff to keyframes even, I could just modify the prompt_map to loop indefinitely while live-editing prompts and performing the controlnet inputs and have a vaiable solution for live interaction with AnimateDiff.

..Even with a single keyframe-step of delay between a UI event and it's result showing up in the SD render viewport, having this capability would still be absolutely nuts. There are so many things I could do with this -- if the implication here is to enable sequential and seamless chunking of AnimateDiff generations, I cannot stress to you how useful this would be.

Interesting but I suppose making a realtime application for frame2frame like webcam filter, etc, should be a fork as it's totally different purpose in terms of interface, code, usage of ram. It could be much lighter at loading and code as well as you don't need any video, prompt_map, related problems, etc. It's really another project. If you fork and make it, it could be useful.

deeplearn-art commented 6 months ago

Here's a demo of the scribble sparse control in v3 Pretty sure this kind of thing will totally eclipse v2

https://github.com/s9roll7/animatediff-cli-prompt-travel/assets/79874185/9f3dc795-4cd9-4e49-8a2c-961f0dedda07

amponce commented 6 months ago

What directory do v3_sd15_sparsectrl_rgb.ckpt and v3_sd15_sparsectrl_scribble.ckpt go into? is it a Motion Module?

edit: found the answer - SparseCtrl

WyattAutomation commented 6 months ago

What directory do v3_sd15_sparsectrl_rgb.ckpt and v3_sd15_sparsectrl_scribble.ckpt go into? is it a Motion Module?

edit: found the answer - SparseCtrl

Sounds like it's now integrated?

I finally have time to jump back into this. If I can get anything useful working I'll fork, update/add my existing UI, and some minimal "on the fly" keyframing and UI tools.

Probably not going to have it where you can plug a PS4 controller into an OpenPose skeleton/ControlNet backend and make AnimateDiff into a game quite yet, but even a baby step closer to that would be exciting. I'll update if I make any progress.

lamuertedeunperrito commented 3 months ago

What directory do v3_sd15_sparsectrl_rgb.ckpt and v3_sd15_sparsectrl_scribble.ckpt go into? is it a Motion Module?

edit: found the answer - SparseCtrl

did you integrate v3? how did you integrate sparsectrl in the json config file? :o