The right way of using SD XL motion model to get good quality output

guoyww / AnimateDiff

Official implementation of AnimateDiff.

https://animatediff.github.io

Apache License 2.0

10.6k stars 872 forks source link

The right way of using SD XL motion model to get good quality output #382

Open iyinchao opened 2 months ago

iyinchao commented 2 months ago

Hi, first I'm very grateful for this wonderful work, animatediff is really awesome 👍

I got stucked in the quality issue for several days, when I use the sdxl motion model. Although the motion is very nice, the video quality seems to be quite low, looks like pixelated or downscaled. Here is the comparation of sdxl image and animatediff frame:	Original image by Animagine XL	Animatediff SD XL Frame

These two images are using the same size configuration. I'm using the comfyUI workflow adopted here: https://civitai.com/articles/2950, with Animagine XL V3.1 model & vae(you can save the image below and import in comfyui):

workflow (1)

I tried with different number of steps / with&height settings / sampler / guidance, but got no luck.

I know the sdxl motion model is still in beta, but I can't get the same good result as the example in Readme. Is there anything I'm doing wrong here 😢 Could anyone show the right way of using the sdxl model? Thank you in advance.

F0xbite commented 2 months ago

You're not doing anything wrong. The SDXL beta motion model is just pure garbage. We're all in the same boat with these kind of XL results. I tried experimenting with video upscaling, but even then the quality of the results were just not as good as what we get from the 1.5 v3 motion model. If i had any understanding of how, I would train my own.

I worked around this by making a hybrid xl/sd1.5 workflow that generates an image with XL and uses 1.5 ip adapter. The detail isn't the same as XL, but the quality of the animation itself is far better. I'm attaching a comparison of 2 animations using the same parameters with the source image used. XL Result Foxphoria_Wild_Animation_774500414548606987_20240910062645452359_43_53 Hybrid XL/1.5 Result Source Foxphoria_Image_774500414548606987_20240908161722377241_4_23

iyinchao commented 2 months ago

@F0xbite Thank you for the information! I also tried the animatediff 1.5/2 motion models, which is way better. Your solution is very enlightening 👍 I'm not going to waste time on the sdxl model. BTW, is there any other motion model better work with SD XL?

F0xbite commented 2 months ago

@F0xbite Thank you for the information! I also tried the animatediff 1.5/2 motion models, which is way better. Your solution is very enlightening 👍 I'm not going to waste time on the sdxl model. BTW, is there any other motion model better work with SD XL?

Glad to help. The only other one that I know of is HotshotXL. Hotshot does have better visible quality, but it's limited to 8 rendered frames max and I don't think it's possible to loop context, both of which are huge caveats for me. Also the quality of the motion seems rather poor and distorted in my testing, but that's just my opinion.

There's also SVD, but it's strictly a image->video model with no prompting and basically no control over motion.

So unfortunately, I don't know of a better solution than the hybrid system I'm using now, until a better motion model is trained for XL or the Flux team releases some kind of text2video model. But I'm sure that's bound to change at some point.

biswaroop1547 commented 2 months ago

thanks a lot for sharing this @F0xbite, would love to use your hybrid workflow above if you could share 🔥

felixniemeyer commented 1 month ago

I have built on the same workflow and have exactly the same issue with seemingly low res output (while it's 1024x1024). I'd like to try out that hybrid workflow. When I naively select the sd15 v2 AnimateDiff model ComfyUI's Animated Diff loader will complain: "Motion module 'mm_sd_v15_v2.ckpt' is intended for SD1.5 models, but the provided model is type SDXL." What's your approach for the hybrid SDXL/sd15 workflow?

F0xbite commented 4 weeks ago

foxbite_hybrid_animatediff.json @biswaroop1547 @felixniemeyer Hey guys, sorry I'm just now seeing this. Here is my workflow. I cleaned it up a bit. Basically, a basic SDXL txt2img workflow, with the output image being fed into the IP adapter for SD1.5. I use a Lora tag loader just as a preference for me for the positive SDXL prompt, you can change it to a standard text encoder if you prefer. Enjoy!

iyinchao commented 3 weeks ago

@F0xbite Thank you for sharing! 👍