Closed SoftologyPro closed 1 year ago
We can actually save approximate intermediate frames with a little elbow grease.
AnimateDiff was trained with the epsilon diffusion objective so the UNet is predicting the noise that was added to the data.
That means that the noise_pred
variable here can be subtracted from latents
(just before the scheduler step) to get what the UNet currently predicts to be the correct latent.
If we decode these latents with the VAE we have a preview of the video!
It will slow down overall sampling time quite significantly to do this often, but can be very worth it for drafting long videos.
It would be great if possible. My main request I get from users who are trying it in Visions of Chaos is for some way to see progress. Even if it was much slower, the user could turn it on, let it get past the first 20 frames or so, check the results and only then if it looked good they can turn off preview frames and run again faster.
I can probably take a look at it this weekend
This would be a highly useful feature; ControlNet and txt2image config for single image generation seems to not carry over much at all to a multi ControlNet workflow for this repo. This has made failure slow and painful; I have no way of knowing if a config that generates excellent single images in any other single-image workflow is going to be totally broken in this tool.
I have most of a PySide6 UI actually finished and working that uses this tool under the hood; it provides a drag/drop area to duplicate and move around traveled prompts, automatically setting their frame number key by a configurable "keyframe interval" and their position in the list of total prompts. It adds "tempo" features to detect BPM from live audio and populates an editable form field when toggled "on", with additional tap-tempo and a couple other common sync tools too. It also provides an intuitive set of tabs, sliders, drop-down menus and more that covers the entirety of a prompt config, in the form of an easy to use GUI that doesn't look half bad.
I almost have something useful that enables a user to dig through Stable Diffusion models and whip up perfectly synced video loops on the fly for VJing; only thing I am finding myself roadblocked by is not having a quick or reliable way to check if a config is broken.
So many configs that work perfectly fine in any other single-image workflows fail completely with AnimateDiff. The trial/error required is simply too slow at moment for the little app I made to be viable for on-stage use. You'd have to have a known set of working/manipulable configs ahead of time which sort of defeats the purpose of being able to whip one up on the fly using an off-the-shelf/untested SD model and drop it's output to a layer in Resolume Arena.
Anyway, thanks for all your hard work on this; it does actually work, I just need to figure out how to implement a single-image "staging" feature to generate 1 to 3 sample frames for the user to tweak settings in a realistic amount of time for on-set use.
How it works now is to generate all frames into memory, then save them all to disk, then combine into the output. Could it be changed to save frames as it goes? This gives 2 advantages;