leszekhanusz / diffusion-ui

Frontend for deeplearning Image generation
https://diffusionui.readthedocs.io
MIT License
142 stars 12 forks source link

What is it for? #60

Open Ainaemaet opened 1 year ago

Ainaemaet commented 1 year ago

Hello, what is the main use-case/point of this software? Not trying to be rude, but I kept seeing the creator talking about it on the webui repo, so I came out of curiosity but I can't really find a good description of what it does - and at first glance it seems like it does less than what webui provides, so I'm just hoping for some clarification as I'm sure I'm likely just missing the point.

Does this frontend provide better inpainting features? I see that it can be used as a frontend for multiple backends... can it switch between backends on the fly with a simple checkbox or something?

Kudos for your hard work, and thank you in advance for taking the time to explain it to me. :)

leszekhanusz commented 1 year ago

Thanks for your interest!

A little bit of history, sorry this is a bit long, tldr at the end.

When they initialy released the Stable Diffusion model on August 22th, there was no interface available to generate images locally on your computer, only a command line interface. At the time I was already incredibly impressed by this technology having previously used Dall-e. So I planned to make a nice web interface inspired by Dall-e. It was clear from the beginning that on the official Stable Diffusion repo there was absolutely no answer to the many issues and PRs so a lot of forks appeared with the work of volunteers being diluted in those different places. At the time I quickly discovered diffusers and thought it would be the obvious solution. There was there a GitHub repo where it seemed that progress was being made and it could centralize the work of all the volunteers in this space.

Gradio was the de-facto interface used to make a quick interface to python code and before Stable Diffusion was release I already made a very quick interface in gradio for inpainting with the precursor latent-diffusion code

So I started using gradio to make that new interface but it was apparent very fast that the component used in gradio for inpainting was difficult to work with. It was to small, impossible to make bigger, buggy, no possibility to undo if you made a mistake. If you go to the same place twice it would just add up another stroke on top of the previous one, making a mess. At the time for me it was clear that if we wanted to have a nice beautiful interface that you could completely control you had to make it yourself with something else than gradio.

Gradio thankfuly provides by default an API to allow to integrate it with other tools so I set to make a new website using Vue which will use the gradio interface as a backend. From the start I wanted it to be able to connect to different gradio backends so I made my interface somehow modulable with the backends defined in a json file.

I implemented my own inpainting component, which has the following features:

I implemented a gallery of images in the right tab, to see your previous results and easily go back to them, allowing to edit generated pictures or to regenerate again a picture (resetting the prompt, all the inputs and the seed). The "Generate again" button is really powerful, it will even remember the strokes you made while you edited a picture so you can go back and undo only the latest strokes or add new ones and try again.

And from the start I set to make my interface responsive, meaning it should work well either on big screens or on mobile. As of now it works really well on mobile with swiping left/right working to changed images inside a generation and up/down to change and compare generations. And for big screens you can leave the left and right panels open to see everything at once.

At the start I made my own backend using a unified diffusers pipeline. I thought that very quickly diffusers would provide such a pipeline with updated features from the community but unfortunately there was some resistance there to provide a quick and easy method for users to use a full-featured pipeline. I now realize that they concentrate more on being a base library and this went nowhere. I had bet on the wrong horse and in the meantime, multiple Stable Diffusion forks competed and as of now it seems clear to me that the automatic1111 repo is clearly ahead of the others.

So I recently implemented using automatic1111 as a backend, this allowed me to concentrate on the ui features of the website while using the advanced features of automatic1111 which would continue to be improved by the community.

This proved more difficult than expected as every small update of automatic1111 (which there are many) broke its API and made it stop working, so I had to implement a system to get the current config of the backend to always know the correct API to use depending on the current version.

The automatic1111 fork is great and really powerful but its UX (User eXperience) is not good. It's the kind of interface where you add everything you can think about on it and it can be overwhelming for someone who just starts to use it. DiffusionUI is a casual alternative for day-to-day use which allow you to generate pictures on an easy-to-use interface, switching seamlessly between txt2img, img2img and inpainting. The goal here is not to replace the automatic1111 interface, some of the more advanced feature (like checkpoint merging for an extreme example) will never be implemented here, but because it is not made with gradio it allows to more easily customize the interface and allow some really great features.

Also when I heard about the Stable Horde, I quickly implemented their api to use it on my frontend (go to https://diffusionui.com/b/stable_horde to test it for free)

tldr: