Using SDXL - Githubissues

jsecretan commented 1 year ago

Some of the image generation seems genuinely better on SDXL, is there the possibility of using this as a base model for animation here?

powerspowers commented 1 year ago

From what I understand the motion model would need to be retrained with SDXL. Is anyone working on this? SDXL is a huge step up in prompt > image quality and I bet would be amazing with AnimateDiff as well.

jFkd1 commented 1 year ago

If the original SD 1.5 model took 5 days to train on 8 A100s, the training time for SDXL would be... quite long. Don't know if any individual is up to the task but following the thread in case some one did.

powerspowers commented 1 year ago

The original AnimateDiff motion training took 5 days on 8 A100s? I probably have access to the compute power to retrain the motion data for SDXL as long as the training data is the correct resolution and the training package is updated by its owner to match SDXL requirements.

jFkd1 commented 1 year ago

@powerspowers Here's the author's reply https://github.com/guoyww/AnimateDiff/issues/4#issuecomment-1631806566 on training time. I feel like data processing will take up a bigger chunk of your time though.

MaxTran96 commented 1 year ago

do you have an ETA on when we will be able to use animate diff with SDXL?

limbo0000 commented 1 year ago

Surely working on that. Please understand this is a research project which we will try our best to push forward :)

MaxTran96 commented 1 year ago

Thank you! I'm looking forward to it

MaxTran96 commented 1 year ago

I have a question, can i pass an image generated by SDXL and run Animated DIffusion so that it can generate an animated gif file?

powerspowers commented 1 year ago

I have a question, can i pass an image generated by SDXL and run Animated DIffusion so that it can generate an animated gif file?

There are a few forks / PRs that add code for a starter image. Given that AD and Stable Diffusion 1.5 favor 512x512 generally you would need to reduce your SDXL image down from the usual 1024x1024 and then run it through AD.

MaxTran96 commented 1 year ago

There are a few forks / PRs that add code for a starter image. -> which fork is this?

kodxana commented 1 year ago

I could train SDXL module if I know correct training steps :)

ykk648 commented 1 year ago

There are a few forks / PRs that add code for a starter image. -> which fork is this?

https://github.com/ykk648/AnimateDiff-I2V

jFkd1 commented 1 year ago

@limbo0000 hello, don't want to rush you or anything, but is there an ETA we can expect the SDXL version? I know it's a lot more complicated than just retraining the same thing, so please take your time!

powerspowers commented 1 year ago

Also, if you need some A100 time reach out to me at powers @ twisty dot ai and we will try to help. AnimateDiff on SDXL would be 🔥 On Oct 2, 2023, at 2:12 PM, jFkd1 @.***> wrote: @limbo0000 hello, don't want to rush you or anything, but is there an ETA we can expect the SDXL version? I know it's a lot more complicated than just retraining the same thing, so please take your time!

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: @.***>

subpanic commented 1 year ago

@jFkd1 @powerspowers Just to insert a little more perspective into this thread. The original dataset was 10.7M video/caption pairs. A great deal if not most of those videos are in the 0.2 - 0.5 megapixel resolution range. Combined with that, the dataset is somewhat ephemeral in that it's based on the various unaffiliated hosts of the videos still hosting them, so it has and will continue to degrade in quantity over time.

So there are a number of problems to solve for an optimal SDXL model, the easiest being modifications to the training script and compute for training. The larger unsolved problems are:

Is the existing dataset to be used as-is? Trying to push it to work with closer to 1 megapixel SDXL is likely to start causing problems. It's unclear if the motion inference will translate reliably to a larger space. Is it even worth wasting compute trying this
Is the solution upscaling? That can also be expensive and time-consuming with uncertainty on any potential confounding issues from upscale artifacts
If the videos as-is or with upscaling aren't sufficient then there's a larger problem of targeting a new dataset or attempting to supplement existing, and large video/caption datasets are not cheap or plentiful

So I think while as @limbo0000 indicated, they'll try their best, it's not necessarily a certain thing in any near time frame.

jsecretan commented 1 year ago

Not sure if it helps anybody, but I have been playing with this in the mean time: https://github.com/hotshotco/hotshot-xl

MaxTran96 commented 1 year ago

i just tried it, i think animatediff is still better

GamingDaveUk commented 1 year ago

Just watched a video on this and wondered why they were using 1.5 models. Will cross my fingers and hope this gets updated to SDXL in time. Reading this thread it seems that may not happen though. Its a shame as there is no way i can go back to 1.5, SDXL blows it out of the water on pretty much every front.

guoyww / AnimateDiff

Using SDXL #116