rupeshs / fastsdcpu

Fast stable diffusion on CPU
MIT License
1.43k stars 115 forks source link

Add multiple LoRA support #145

Closed monstruosoft closed 6 months ago

monstruosoft commented 7 months ago

This commit adds multiple LoRA support and updates interactive CLI mode to allow loading/editing LoRA weights dynamically.

As an example, the following images were generated by dynamically loading/editing LoRA weights, using the same seed and same prompt by first launching CLI interactive mode using the --lora argument (the prompt includes the trigger words for both used LoRAs); the following is the image generated using the Jane LoRA from the CLI argument: 740a7f4b-3657-4eed-af8a-07f8bc0665ae-1

Next, an ancient-chinese-scroll LoRA is loaded and the following image is generated: cc44f35d-ac49-42d7-b692-c281f86c24ff-1

Next, LoRA weights are set to 0.0 for both LoRAs, effectively returning to using just the base model: 3cbf5781-c1a5-4998-9c19-7ba3149c3fdd-1

Please note that the current LoRA code in the FastSD CPU repository is hardcoded to support only one LoRA (the whole pipeline is rebuilt if the LoRA changes) and I didn't want to risk breaking anything so I had to work around that limitation; however, it should be trivial to write the required changes.

In the next post I'll write some extra considerations to take into account when using multiple LoRAs.

monstruosoft commented 7 months ago

Some extra details for multiple LoRAs:

monstruosoft commented 7 months ago

Is it possible not to rebuild the diffusion pipeline when going from txt2img to img2img and viceversa? In the current FastSD code, the pipeline is rebuild on a number of conditions, including changing the diffusion task; however, I think that, since diffusion tasks share the same UNet, VAE, etc., it may be possible to always create both pipelines (txt2img and img2img) so that you can switch between them without the need to rebuild unless it's really necessary, for example, when changing the base model.

You can see what I'm talking about in this PR's CLI mode or in the WebUI where, if you generate an image in the txt2img tab and then go to the img2img tab and generate a new image, the pipeline is rebuild, load a LoRA and the pipeline is rebuild; this causes LoRA models to get lost when switching between modes and, while you can load them back, I think it would be nice to reuse the pipeline.

I've tried the following changes in _src/backend/lcm_text_toimage.py to always build both the txt2img and img2img pipelines and it seems to work, however I'm not sure if it's correct:

diff --git a/src/backend/lcm_text_to_image.py b/src/backend/lcm_text_to_image.py
index b7659ae..e8caa39 100644
--- a/src/backend/lcm_text_to_image.py
+++ b/src/backend/lcm_text_to_image.py
@@ -116,8 +116,8 @@ class LCMTextToImage:
             or self.previous_ov_model_id != ov_model_id
             or self.previous_safety_checker != lcm_diffusion_setting.use_safety_checker
             or self.previous_use_openvino != lcm_diffusion_setting.use_openvino
-            or self.previous_task_type != lcm_diffusion_setting.diffusion_task
-            or self.previous_lora != lcm_diffusion_setting.lora
+            # or self.previous_task_type != lcm_diffusion_setting.diffusion_task
+            # or self.previous_lora != lcm_diffusion_setting.lora
         ):
             if self.use_openvino and is_openvino_device():
                 if self.pipeline:
@@ -168,10 +168,10 @@ class LCMTextToImage:
                         use_local_model,
                     )

-                if (
-                    lcm_diffusion_setting.diffusion_task
-                    == DiffusionTask.image_to_image.value
-                ):
+                #if (
+                #    lcm_diffusion_setting.diffusion_task
+                #    == DiffusionTask.image_to_image.value
+                #):
                     self.img_to_img_pipeline = get_image_to_image_pipeline(
                         self.pipeline
                     )
@@ -194,24 +194,24 @@ class LCMTextToImage:
                     )
                 else:
                     print("Using Tiny Auto Encoder")
-                    if (
-                        lcm_diffusion_setting.diffusion_task
-                        == DiffusionTask.text_to_image.value
-                    ):
-                        load_taesd(
-                            self.pipeline,
-                            use_local_model,
-                            self.torch_data_type,
-                        )
-                    elif (
-                        lcm_diffusion_setting.diffusion_task
-                        == DiffusionTask.image_to_image.value
-                    ):
-                        load_taesd(
-                            self.img_to_img_pipeline,
-                            use_local_model,
-                            self.torch_data_type,
-                        )
+                    #if (
+                    #    lcm_diffusion_setting.diffusion_task
+                    #    == DiffusionTask.text_to_image.value
+                    #):
+                    load_taesd(
+                        self.pipeline,
+                        use_local_model,
+                        self.torch_data_type,
+                    )
+                    #elif (
+                    #    lcm_diffusion_setting.diffusion_task
+                    #    == DiffusionTask.image_to_image.value
+                    #):
+                    load_taesd(
+                        self.img_to_img_pipeline,
+                        use_local_model,
+                        self.torch_data_type,
+                    )

             if (
                 lcm_diffusion_setting.diffusion_task

EDIT: I just realized that, in this diff, the actual img2img pipeline creation should be indented one level out, otherwise it's only created when using LCM models.

rupeshs commented 7 months ago

I've tried the following changes in _src/backend/lcm_text_toimage.py to always build both the txt2img and img2img pipelines and it seems to work, however I'm not sure if it's correct:

Does this work with OpenVINO 's workflow? I have similar implementation(resusing same pipeline) in diffusionmagic https://github.com/rupeshs/diffusionmagic

monstruosoft commented 7 months ago

OpenVINO's workflow is isolated in a separate section by the corresponding if-else block, so these changes won't affect the OpenVINO workflow which will continue to work as before. By the way, I just realized that, in the diff from the previous post, the actual img2img pipeline creation should be indented one level out, otherwise it's only created when using LCM models.

Unfortunately, I can't test OpenVINO on my machine since, as you mention in a recent change to Readme.md, it uses a lot more RAM than I have, forcing it to use swap space in the hard disk, making image generation with OpenVINO extremely slow.

And actually, that's another issue I had planned to bring into discussion and to look into. I remember in earlier versions of FastSD CPU, OpenVINO didn't use that much RAM; it might be possible to track down the commit where that changed and see what's causing it. My guess is that OpenVINO is compiling the model into RAM and then loading that model for inference, effectively using twice as much RAM, but that's just a guess.

monstruosoft commented 7 months ago

The most recent commit adds the code to reuse the txt2img and img2img pipelines in LCM and LCM-LoRA modes, this allows to retain loaded LoRAs when switching between modes in the interactive CLI, should also work in the WebUI.

rupeshs commented 7 months ago

The most recent commit adds the code to reuse the txt2img and img2img pipelines in LCM and LCM-LoRA modes, this allows to retain loaded LoRAs when switching between modes in the interactive CLI, should also work in the WebUI.

Hi @monstruosoft I just tested this,but webui lora model switching and weight change are completely broken due to this change, could you please fix ? Other than that this PR looks good.

  1. Webui lora model changing is not working
  2. Webui strength changing is not working

For FastSD CPU , backend implementation should not depend on any interface(loosely coupled architecture). So when we make a change or new feature it should work on GUI/webui/cli without additional change in backend code. Here I can see this implementation only considers interactive CLI.

monstruosoft commented 7 months ago

Sorry, I had zero experience writing Python code or collaborating to a project prior to my first PR here, so I apologize for any issues with my code. I try to write my code so it doesn't break anything but you're right, I didn't try the LoRA loading code in the WebUI, will take a look into it. Thanks.

monstruosoft commented 7 months ago

Last commit fixes LoRA loading in the WebUI; the old code loaded a single LoRA whenever the pipeline was rebuild and it wasn't working since now the pipeline is being reused. The solution was simple, similar to the code already used in the interactive CLI; writing the GUI code was the real problem. I'm not familiar with Gradio so I had a lot of trouble getting the GUI to work the way I wanted, but I guess the result is acceptable.

Some things to note:

rupeshs commented 6 months ago

Last commit fixes LoRA loading in the WebUI; the old code loaded a single LoRA whenever the pipeline was rebuild and it wasn't working since now the pipeline is being reused. The solution was simple, similar to the code already used in the interactive CLI; writing the GUI code was the real problem. I'm not familiar with Gradio so I had a lot of trouble getting the GUI to work the way I wanted, but I guess the result is acceptable.

Some things to note:

  • You have to generate an image first, to generate the pipeline, before you can load LoRAs; I do the same in the interactive CLI since pipelines are initialized in the image generation code. Generating the pipeline (calling _lcm_text_toimage.init()) at program start might solve this.
  • Maximum LoRA models in WebUI is hardcoded to 5; that should be enough in most cases since many LoRAs tend to conflict with each other. You can load more LoRAs but they won't appear in the WebUI.
  • If the settings file contains a LoRA path, it will be loaded when generating the first image but it won't appear in the WebUI; that's not the way LoRAs are meant to be loaded in the WebUI.
  • Once loaded, you can set a LoRA weight to 0.0 to disable its effect but you can't currently unload it.

@monstruosoft Nice work! It seems like there is a bug-generated image is blurred for the first time. To reproduce follow the steps :

  1. start-webui
  2. generate image in LCM-Lora mode
  3. then load a Lora and generate image ( I'm getting blurred response)

image

monstruosoft commented 6 months ago

That bug was caused by using a dummy object to pass the LoRA settings, it should be fixed now. The same bug was present in the interactive CLI, it should be fixed there too.

rupeshs commented 6 months ago

That bug was caused by using a dummy object to pass the LoRA settings, it should be fixed now. The same bug was present in the interactive CLI, it should be fixed there too.

Thanks @monstruosoft

monstruosoft commented 6 months ago

I see you had to add some fixes after merging this PR, I apologize for any issues. The --lora and _--loraweight CLI arguments were added for testing LoRA support but now that LoRAs can be added interactively both in the CLI and WebUI, those arguments are no longer needed and could be removed.

rupeshs commented 6 months ago

@monstruosoft it was nice work! currently, I'm doing some cleanups.

monstruosoft commented 6 months ago

Thanks.