Open danielpatrickhug opened 2 years ago
Credit to Deforum community and more- Usage displayed here. https://colab.research.google.com/github/deforum/stable-diffusion/blob/main/Deforum_Stable_Diffusion.ipynb
video explanation: https://www.youtube.com/watch?v=F1bk9OXOmow
spaces example: https://huggingface.co/spaces/akhaliq/DPT-Large would be cool with textual inversion.
this may be a mountain of a request.
I am trying out deforum. Will take a look at this.
Seems an epic. But maybe we can do this piece by piece.
@0x1355 Hi! potentially we can start with a 2d transformation, like perpetual outpainting in the y direction.
I wonder how much these transformations will effect the quality of the interpolation videos, though? I don't really know what the difference is between what deforum is doing for frame interpolation vs what we are doing here. I am more of a fan of the interpolations happening here, as they look "cleaner + smoother" to me (but I'm probably biased😅)
If we were to add something like this, would it be doing something different for the community than what deforum has already done?
You're right they may not be as clean as they are now at first. from what I gathered that part of the deform code for the 3d transformation came from the disco diffusion library and https://twitter.com/gandamu_ml as cited here.
I thought it would be good to rewrite some of the ideas in a more explicit fashion, like in a pipeline as It took me a long time to understand what was going on in that notebook. But a lot of the ideas are cool and could be expanded on. Also, a python package format would be good for modularity and redundancy sake and maybe we get lucky and improve or learn something new :).
Back from deforum land. Will put my thoughts in words tomorrow.
TL;DR Low impact. High effort. Personally I would prioritize for something else for now.
Low impact deforum has 2D, 3D, video_input, and interpolation animation modes. In comparison, sd-videos only does interpolation at the moment - but it is smoother due to different implementation. See this example:
https://user-images.githubusercontent.com/4979897/193022967-14d7b407-6495-44bd-ab2e-d4cc6f91e980.mp4
Adding 3D animation, at least if we do it the same way as deforum, will result in a 3D mode that is, at best, as good as deforum. This doesn't add much value to sd-videos users. They can just use deforum for that.
High effort What if we do it differently? Possible, but not easy.
deforum 2D/3D animation mode does the following:
Repeated initial image and blending help with coherence. But the flip side is that longer and slower movement videos tend to degrading into artifacts like lines and patterns, like at the end of this video:
https://user-images.githubusercontent.com/4979897/193010083-9cc8af48-a950-4906-8c2d-ff2f1c5dc91f.mp4
This issue has been on their radar for a while, but they haven't been able to find a better solution yet. This suggests high effort.
Prioritize for something else
I use sd-videos to make longer and slower videos. The two most common issues I see are:
Personally I prefer to work on things like the above, where we can add more value.
It doesn't mean I don't like or want to work on a 3D mode. Just a lower priority for me now.
Thank you so much yet again @0x1355 for your comprehensive deep dive into another issue here. I'm going to mark this as low priority/not going to solve for now.
Add functionality for calculation of depth tensors as seen here. https://github.com/deforum/stable-diffusion/blob/main/helpers/depth.py also Affine geometric transformation functions(translations, rotations, scaling) would be a cool addition and project.