volotat / SD-CN-Animation

This script allows to automate video stylization task using StableDiffusion and ControlNet.
MIT License
806 stars 61 forks source link

twisting #135

Open thezveroboy opened 1 year ago

thezveroboy commented 1 year ago

I have a question about generation.

previous versions worked fine for me. but due to recent updates, I see that text2video generations have begun to twist clockwise a lot, as if the filming operator is falling on its side.

is it just me? What is this? changing the generating algorithm? an attempt to make a more dynamic picture? how can i turn it off?

thezveroboy commented 1 year ago

still looking for some solution for text2video generation.

as I understand it, a new frame is created by mixing noise which is obtained from the previous frame by warping it. and if in the first version the degree of warping was lower, now it has been increased.

in this regard, the generation of people in text2video mode is completely useless because even on a 1-second video, any person warping and falls sideways, or his facial and body features are warped and deformed from frame to frame.

is it possible to add in the settings the degree of warping or different modes of warping (rotation, linear, zoom etc)?

peteTater commented 1 year ago

If you don't mind reading through my PHD thesis length comments here: https://github.com/volotat/SD-CN-Animation/issues/94 it's something I'm hoping will be 'featurised' in time.

There is definitely a commit (around 83 from May 13th or just below) where the coherence of characters was much more consistent in a higher number of frames.

I tried rolling back using the method described by volotat in the abovementioned thread (handy if you've not had to rollback extensions before), but got errors with the version I had thought was previously working well for me and providing the most character feature consistency per sequence.

I'll see if I can dig out my note of the specific hash/commits for you later today. Perhaps you'll have better luck getting it working (and please let me know if you do)!

thezveroboy commented 1 year ago

thanks for your reply!

I'll try your advice. If there is any additional information I would be extremely grateful!

volotat commented 1 year ago

Regarding the original post, there were no changes that could cause such twisting, so the most likely reason is just a perception of the movement. But you should expect that text 2 video mode will be changing after some time as there are a lot of experiments going on, and most likely I will not be able to support legacy models in a single codebase and I doubt anybody really needs it.

peteTater commented 1 year ago

Sorry I won't have access to my notes til later today, but if you're trying to get this working right now, I didn't get the desired results when rolling back to eddf1a4, which was the version volotat recommended I try in the linked thread, so I tested out the one before ca5fdb1, but IIRC I couldn't get that one to load.

I'd start with rolling back to eddf1a4 as volotat describes in the thread to see it working as it should, then try to get ca5fdb1 working. Can't verify presently, but I think that may be the earliest version that includes the toggle for generating frames as well as the video.

It'll throw a bunch of errors in the web-user cmd window and fail to load the tab if it isn't behaving as expected on your machine. That's what happened when I tried rolling back and forward with some of the commits, but this could have been impacted by other extensions when I tried it previously, so it might be an idea to disable any extensions that aren't strictly necessary for your SD-CN-Animation workflow while you're testing.

Hope that helps, and just let me know if you need any more info.

Edit: oops I meant a35f446 was definitely working when I tried, but as the dev has chimed in while I was typing my reply, I'll defer to volotat's response on the matter!

thezveroboy commented 1 year ago

Regarding the original post, there were no changes that could cause such twisting, so the most likely reason is just a perception of the movement. But you should expect that text 2 video mode will be changing after some time as there are a lot of experiments going on, and most likely I will not be able to support legacy models in a single codebase and I doubt anybody really needs it.

i think you are right and the rotation has not changed from version to version and this is just my perception

is it possible to give an opportunity to change the deformation algorithm from rotation to other algorithms (rotation, linear, pulse, zoom-in, zoom-out etc) or/and add in the settings the degree of warping?

volotat commented 1 year ago

There are no deformation algorithm, instead there are motion prediction model that generate the deformation. In principle this method should allow it to generate arbitrary video, but the motion model was built as a little experiment without too much thought put into its architecture and very small training dataset. The research is still going, so I hope one day it will behave much better, but there is no guarantee of anything.

thezveroboy commented 1 year ago

There are no deformation algorithm, instead there are motion prediction model that generate the deformation. In principle this method should allow it to generate arbitrary video, but the motion model was built as a little experiment without too much thought put into its architecture and very small training dataset. The research is still going, so I hope one day it will behave much better, but there is no guarantee of anything.

thanks for reply. your plugin is very cool. it is convenient and maximally integrated into the automation structure with its functionality and capabilities. I will really wait for some changes in the generation algorithm.

thezveroboy commented 1 year ago

may I return to this question? I don't know what happened with the latest update, but it just ruins the picture when drawing text2video right from the second frame. if earlier the video was grumbling with a screw, now it just collapses.

I checked several times on different prompts and checkpoints. is it possible to pay attention to just such a generation mode? Unfortunately it is completely inoperable now.

perhaps this is due not to a plugin update, but to an update of the automatiс1111. could you check simple text2video generations with default settings?

vapidvim commented 1 year ago

@thezveroboy I notice the default settings are not optimal. I am having good luck with 768x512 at 30 steps using DPM++ SDE. I noticed some samplers just fall apart and some models work better than others. I also have much better luck just keeping controlnet off and lowering processing strength to around .33.

thezveroboy commented 1 year ago

I notice the default settings are not optimal.

thanks for reply! i have used many models and i can say that version 0.7 of sd-cn has work more better than 0.9 with any models

rbfussell commented 1 year ago

About a month late, but looking at the occlusion image, It appears to be an issue with the the processing of the occlusion image. As you can see here in this video of the occlusion images, the edges of the image moves in the direction of the shearing of the actual image contents. Those white spaces on the edges, are not present in the actual image that is displayed.

https://www.youtube.com/shorts/_c1mjOcyq1s

as can be seen here (sorry it is a youtube short, youtube doesn't give an option to make it a normal video due to size and length)

It appears to be occuring in the flow computions within RAFT I assume. I am not sure how RAFT works, but I feel that there needs to be a bounding box that it cannot simply "peel" the image away from leaving the white "blank" areas. This MAY not actually be the issue, however, and instead it may simply be the flow calculations themselves.