kabachuha / sd-webui-text2video

Auto1111 extension implementing text2video diffusion models (like ModelScope or VideoCrafter) using only Auto1111 webui dependencies
Other
1.28k stars 108 forks source link

Add Multiple LDM Compatible Samplers #187

Closed ExponentialML closed 1 year ago

ExponentialML commented 1 year ago

Here's a feature that allows for multiple samplers to be used. A dropdown is added to the extension with two supported samplers, DDIM, UniPC. DPM Solver may also be supported as it's apart of the LDM code base. This will also sunset the current DDIM sampler, or make it a legacy one.

The sampler code has been refactored so that we can use and modify existing ones. Originally the plan was to leverage the built-in samplers in the webui, but there would be too many hooks and workarounds to make it work properly.

Each sampler is modified in some way to allow for separable conditioning (conditioning, unconditioning) rather than chunks, and the UniPC has some modifications that allows it to work properly. Currently, there is a bug that is preventing DDIM from working properly, but vid2vid should work just fine.

Before merging, the following must be resolved.

After that, we should be good to go :+1: .

ExponentialML commented 1 year ago

This is now ready for testing and review.

image

B34STW4RS commented 1 year ago

Works pretty good, however I noticed it seems to massively change vid2vid content relative to the original sampler. For instant ddim_gaussian gave me a white space marine looking to the right and firing a rifle, while the original sampler gave me a blue space marine firing a laser pistol looking left.

https://github.com/kabachuha/sd-webui-text2video/assets/11381013/6458a912-5544-48e6-b39b-c7fe9a0c56d8

Unipc seems to give the expected result howerver, eg:

https://github.com/kabachuha/sd-webui-text2video/assets/11381013/37c11328-7a88-47d3-8504-65959e2c6252

Overall excellent* work will keep testing over the weekend.

edit-felt rude to say good*, meant excellent.