songweige / sd-webui-rich-text

119 stars 2 forks source link

Integration with the builtin txt2img and img2img tabs #15

Open ljleb opened 12 months ago

ljleb commented 12 months ago

Hi! First of all, I want to thank you for contributing an A1111 extension on top of the paper and code. Not everyone does it!

Is there a technical reason the extension was not developed like a normal extension? For example, it creates its own tab and removes a lot of knobs, on top of disabling other extensions.

It's not even possible to change the size of the generated images as of right now, which is a pretty big deal breaker for a lot of users I think. Also, the model runs out of memory easily. Is this because the extension uses a different pipeline that isn't as optimized as the webui's pipeline?

Although I think this is a very interesting contribution and has a lot of potential, I want to believe better integration with the webui is possible. Are there any plans to completely delete the custom txt2img tab and use the normal pipeline of the webui instead, just like all other extensions do, for all the benefits and possibilities this would bring?

ljleb commented 12 months ago

I just realized, but this is based on a template repo from the community: https://github.com/udon-universe

This template code is not ideal for an extension. See this issue for more info: https://github.com/udon-universe/stable-diffusion-webui-extension-templates/issues/3

Here are some useful tips to develop webui extensions (same link as in the issue referenced above): https://github.com/vladmandic/automatic/wiki/Extensions#common-mistakes

PladsElsker commented 12 months ago

I agree with this. I think this extension has a lot of potential, and I would love to use it.

Because it's implemented as its own pipeline, I don't see where it fits in my current workflow, which already heavly relies on controlnet and such other extensions. Is there a technical reason for this?

songweige commented 12 months ago

Thank you for the comments and URLs! The URLs can be a great help. I apologize for the inconvenience, as I was not familiar with the A1111 before and this is my first time developing an extension. But I would love to learn and make it better!!

Regarding the size of the generated image, it should be easy to change and I will add it in one or two days since I'm currently traveling. And for integrating it with the text2img and img2img tabs, I would love to try it and think it should be possible. If you worked with some extensions of similar methods, i.e., manipulating the self-/cross- attentions of the stable diffusion / contronet, would you mind sharing some references? That would be super helpful to mel!! Thanks!

ljleb commented 12 months ago

I did not work with self- or cross-attention with the webui in the past, but I can try finding the appropriate code. IIUC this extension targets the code of the attention layers of the model. I think the function to patch depends on what the extension needs to do with the attention layer. I don't know if this is it, but maybe it is possible to patch this function here? https://github.com/AUTOMATIC1111/stable-diffusion-webui/blob/5ef669de080814067961f28357256e8fe27544f4/modules/sub_quadratic_attention.py#L117

Another approach could be to patch the functions on the model directly. See an example of how the webui patches the cross attention here: https://github.com/AUTOMATIC1111/stable-diffusion-webui/blob/5ef669de080814067961f28357256e8fe27544f4/modules/sd_hijack_optimizations.py#L60

To do the same, you could unconditionally patch these functions when the extension loads and fallback on the dynamic previous function if the extension is disabled. To make sure users can still change the underlying optimization, you could try to patch the apply_optimizations function so that you can keep up to date the correct fallback optimization to use: https://github.com/AUTOMATIC1111/stable-diffusion-webui/blob/5ef669de080814067961f28357256e8fe27544f4/modules/sd_hijack.py#L52

I would think controlnet should work if you successfully patch the webui code. If there are special cases to be considered with controlnet, it is a little bit tricky but it should still be possible to patch it. In short, you'll need to:

  1. import controlnet, if the extension is installed, and do nothing otherwise
  2. I presume you will need to patch this file: https://github.com/Mikubill/sd-webui-controlnet/blob/main/scripts/hook.py

See for example how I patched the controlnet extension here (in this case, th.cat is patched to override the concatenation logic of skip connections with the backbone): https://github.com/ljleb/sd-webui-freeu/blob/ebd3164570fefb1428b05ee28ea6a427f11e800a/lib_free_u/unet.py#L15-L30

songweige commented 12 months ago

I have incorporated the option to change the sizes of the generated images!

And very interesting and useful references, thanks!!! The example of patching th.cat is especially clear and helpful. I found that I just "patched" files in the diffusers library to achieve everything. And I added the comments # Rich-Text: ... to every chunk of the code I modified from the original repo as shown here. So if I understand it correctly, there are two main parts that I need to patch to make this truly works with A1111:

  1. Forward function in attention layers: I modify the return of the function to collect attention maps and modify the input of the function inject attention maps. The collection and injection are performed using hooks. Also, since the return of the attention is changed, I also need to modify other functions that called the attention. So my question is, if other extensions also patch the same function, will that cause a conflict?
  2. Sampling process: during the sampling process, for tokens with rich-text attributes, the UNet is called multiple times to obtain the scores. This is different from the typical samplers in A1111. Also, hooks are registered to modify the attention maps or features for injection. Are you aware of any code that patches the sampling process?
ljleb commented 12 months ago

After a quick look at the linked code, I believe your initial plan is looking good.

if other extensions also patch the same function, will that cause a conflict?

It is likely, yes. But even if there are no extensions that do patch the same function yet, new extensions could be written that do. So to be able to tackle this, it could be a good idea to keep a reference to the original function before patching, and if possible eventually call the original function in the patch code.

I like to use mod.fn = functools.partial(patch_fn, original_function=mod.fn) to patch a function in a single line while satisfying these constraints. Then, define def patch_fn(arg list..., *args, original_function, **kwargs) and use original_function if possible so that if another extension has patched the code before your extension, it will keep working. If another extension patches but does not call your function, then it isn't your problem, it's a bug with that other extension instead. At least that's the way I do things on my side.

Are you aware of any code that patches the sampling process?

I'm not sure you can patch the code so that the webui does these calls for you. (I think you can, see the paragraph below) However, you can definitely call the webui model yourself as many times as you like besides the existing pipeline. IIUC, the extension needs to call it with its own kind of pipeline, which shouldn't disrupt the normal pipeline of the webui. If that is the case, then you could try to call the model similarly to how it is done here: https://github.com/AUTOMATIC1111/stable-diffusion-webui/blob/5ef669de080814067961f28357256e8fe27544f4/modules/sd_samplers_cfg_denoiser.py#L169

If instead you want to schedule additional calls to the webui pipeline (which I suspect is the case), you can do this by modifying the p object you receive in the different process_* callbacks of scripts. You could take inspiration from the adetailer extension, which schedules additional diffusion pipelines for the same image generation using the postprocess_image callback: https://github.com/Bing-su/adetailer/blob/d51f4b33cb53223e9ca03f6e2e7cb7dfd1ec8285/scripts/!adetailer.py#L666

continue-revolution commented 11 months ago

The art of A1111 is to do wild patch and have tremendous amount of conflicts. Do not be afraid of that.

After you are familiar with A1111, you will forget about diffusers.

This is a good work, and I’ve noticed several months ago. Unfortunately I have already had two extensions to develop, and I hope that I could have 48 hours per day.

ljleb commented 11 months ago

The art of A1111 is to do wild patch and have tremendous amount of conflicts. Do not be afraid of that.

I agree, it's really not great design. But that's how things are made as of right now in this environment.

As a side note, IMO someone needs to create a pip package that exposes a very stable API and handles all the typical webui monkey patching in a central place. Then, extensions can import this module and use a more stable API to write code and don't have to worry about extension compatibility, as that would be the job of the pip package. I don't have the time to do it right now but I think this is something we really need. This could eventually provide a way to write code once to create extensions for all mainstream stable diffusion hosts, for example comfyui, a1111 webui, sd.next, etc.

It would go something like this:

import sdapi
sdapi.use_a1111_host()

# use module `sdapi` instead of `modules` or submodules thereof
continue-revolution commented 11 months ago

As of now, I think it is extremely hard to do that, since basically everything could be hacked. For example, in my AnimateDiff extension, I hacked lora.networks.load_network, CFGDenoiser.forward, ControlNet BatchHijack.processing_process_images_hijack, ControlNet Scripts.controlnet_main_entry, and several other functions/methods in other modules. I did not expect that I would be hacking those, but things just moved so fast that I had to do that.

ljleb commented 11 months ago

Indeed, I don't think it would be easy to support every use case at the same time for V1. It would be a gradual process of adding support for more and more features using patches in different locations of the diffusion process.

I think once we have enough extensions on this API instead of going crazy with monkey patching, there's a way to make it work without patching, even if you want to have extensions communicate. One way to do this would be to have extensions expose their resources in some way by registering them, and then the API could allow other extensions to run code before or after certain events related to these resources.