failed to use the feature of supporting for A1111 LoRA

icech commented 1 year ago

I am glad to see the diffusers add the supporting for A1111 LoRA. However, i failed to ues this feature after I update the diffusers. It did not exporting error as before, but the lora have no effect in the generated images. I use it as following:

pipe = StableDiffusionPipeline.from_pretrained(repo_id, torch_dtype=torch.float16, 
                                               revision="fp16", safety_checker=None).to("cuda")
pipe.load_lora_weights("./Lora", weight_name="xxx.safetensors")
generator = torch.Generator(device='cuda')
prompt = "a photograph of a man running with dog"
image = pipe(prompt, width=768, height=1280, 
                       generator=generator, num_inference_steps=20, 
                       cross_attention_kwargs={"scale": 1}
                       ).images[0]

and this is how I used to add lora in the past which is form #3064 by @pdoane :

def load_lora_weights(pipeline, checkpoint_path, multiplier, device, dtype):
    LORA_PREFIX_UNET = "lora_unet"
    LORA_PREFIX_TEXT_ENCODER = "lora_te"
    # load LoRA weight from .safetensors
    state_dict = load_file(checkpoint_path, device=device)

    updates = defaultdict(dict)
    for key, value in state_dict.items():
        # it is suggested to print out the key, it usually will be something like below
        # "lora_te_text_model_encoder_layers_0_self_attn_k_proj.lora_down.weight"

        layer, elem = key.split('.', 1)
        updates[layer][elem] = value

    # directly update weight in diffusers model
    for layer, elems in updates.items():

        if "text" in layer:
            layer_infos = layer.split(LORA_PREFIX_TEXT_ENCODER + "_")[-1].split("_")
            curr_layer = pipeline.text_encoder
        else:
            layer_infos = layer.split(LORA_PREFIX_UNET + "_")[-1].split("_")
            curr_layer = pipeline.unet

        # find the target layer
        temp_name = layer_infos.pop(0)
        while len(layer_infos) > -1:
            try:
                curr_layer = curr_layer.__getattr__(temp_name)
                if len(layer_infos) > 0:
                    temp_name = layer_infos.pop(0)
                elif len(layer_infos) == 0:
                    break
            except Exception:
                if len(temp_name) > 0:
                    temp_name += "_" + layer_infos.pop(0)
                else:
                    temp_name = layer_infos.pop(0)

        # get elements for this layer
        weight_up = elems['lora_up.weight'].to(dtype)
        weight_down = elems['lora_down.weight'].to(dtype)
        alpha = elems['alpha']
        if alpha:
            alpha = alpha.item() / weight_up.shape[1]
        else:
            alpha = 1.0

        # update weight
        if len(weight_up.shape) == 4:
            curr_layer.weight.data += multiplier * alpha * torch.mm(weight_up.squeeze(3).squeeze(2), weight_down.squeeze(3).squeeze(2)).unsqueeze(2).unsqueeze(3)
        else:
            curr_layer.weight.data += multiplier * alpha * torch.mm(weight_up, weight_down)

    return pipeline

Is it my incorrect usage or is there a difference between the current code and what @pdoane provided?

patrickvonplaten commented 1 year ago

cc @sayakpaul

sayakpaul commented 1 year ago

Hi @icech.

Could you share your LoRA file so that we can debug it on our end?

Cc: @takuma104

icech commented 1 year ago

Hi @icech.

Could you share your LoRA file so that we can debug it on our end?

Cc: @takuma104

The link of my LoRA file is https://civitai.com/models/7501/vivid-watercolors-lora-extraction

sayakpaul commented 1 year ago

@icech I am able to use the LoRA without any problems. See my Colab: https://colab.research.google.com/gist/sayakpaul/1fff0ff9c5a059364f80ac0b64920592/scratchpad.ipynb

Of course, I don't know about the base pipeline associated with that. So, you will have to work that one out.

icech commented 1 year ago

@icech I am able to use the LoRA without any problems. See my Colab: https://colab.research.google.com/gist/sayakpaul/1fff0ff9c5a059364f80ac0b64920592/scratchpad.ipynb

Of course, I don't know about the base pipeline associated with that. So, you will have to work that one out.

I'm sorry for the delayed response. I have reviewed your Colab notebook and made some modifications. I can provide a comparison between the version with and without Lora, and generate four images. Only one of the images will be different. The code is below

import torch

from diffusers import StableDiffusionPipeline, DPMSolverMultistepScheduler

pipeline = StableDiffusionPipeline.from_pretrained(
    "runwayml/stable-diffusion-v1-5", torch_dtype=torch.float16, safety_checker=None
).to("cuda")

pipeline.scheduler = DPMSolverMultistepScheduler.from_config(
    pipeline.scheduler.config, use_karras_sigmas=True
)

images = pipeline(prompt="masterpiece, best quality, mountain landscape",
    negative_prompt="bad quality",
    width=512,
    height=512,
    num_inference_steps=15,
    num_images_per_prompt=4,
    generator=torch.manual_seed(0)
).images

for i, image in enumerate(images):
    image.save(f'images/nonelora_{i}.png')

pipeline.load_lora_weights(".", weight_name="vividWatercolors_10.safetensors")
images = pipeline(prompt="masterpiece, best quality, mountain landscape",
    negative_prompt="bad quality",
    width=512,
    height=512,
    num_images_per_prompt=4,
    num_inference_steps=15,
    generator=torch.manual_seed(0)
).images

for i, image in enumerate(images):
    image.save(f'images/lora_{i}.png')

sayakpaul commented 1 year ago

I see what you're saying. Reproduced here: https://colab.research.google.com/gist/sayakpaul/b645715d9144a3a6dc40c93bdceee929/scratchpad.ipynb.

Some questions:

Could you also have some images (LoRA) for us so that we can have some one-on-one comparisons?
Are the results being affected because we're not likely using the right base model? For example, here, we're using the right base model to load the parameters into. We need to ensure the base model being used here is indeed the correct one.

takuma104 commented 1 year ago

That's quite an interesting result. It seems that the effect varies in magnitude, but it's not just the first one that differs; there appears to be some change in all of them. I've posted an image created by merging the two results using the difference mode in Photoshop.

Without LoRA: ダウンロード (9)

With LoRA: ダウンロード (10)

Diff: diff

icech commented 1 year ago

I see what you're saying. Reproduced here: https://colab.research.google.com/gist/sayakpaul/b645715d9144a3a6dc40c93bdceee929/scratchpad.ipynb.

Some questions:

Could you also have some images (LoRA) for us so that we can have some one-on-one comparisons?

Are the results being affected because we're not likely using the right base model? For example, here, we're using the right base model to load the parameters into. We need to ensure the base model being used here is indeed the correct one.

I've seen the graph in your Colab. they are the same as what I drew before. I'm sorry that I can't provide any pictures these days since I'm on vacation and don't have access to my machine. However, you can test the 'load_lora_weights(pipeline, checkpoint_path, multiplier, device, dtype)' I originally mentioned to replace 'pipeline.load_lora_weights' for loading lora. The usage is 'pipe = load_lora_weights(pipe, lora_path, 1.0, 'cuda', torch.float32)', so you can make a comparison. Based on my previous experience, the two should not be consistent, and the version of @pdoane is the expected result, consistent with A111.

icech commented 1 year ago

That's quite an interesting result. It seems that the effect varies in magnitude, but it's not just the first one that differs; there appears to be some change in all of them. I've posted an image created by merging the two results using the difference mode in Photoshop.

Without LoRA:

With LoRA:

Diff:

Your analysis is very rigorous. Indeed, there are differences in the subsequent images, but these differences are not as expected.

sayakpaul commented 1 year ago

I will dive more to find out what we're missing :) But expect some delay as I am on the move and away for sometime.

alexblattner commented 1 year ago

@icech use this: loraLoader.txt

thank me later.

icech commented 1 year ago

@icech use this: loraLoader.txt

thank me later.

I've tried this and I know it's feasible, but I mainly want to use the official API for easier maintenance in the future. Thank you anyway.

patrickvonplaten commented 1 year ago

Let's try to fix this this week so it's in the next release cc @sayakpaul , could this maybe be fixed with: https://github.com/huggingface/diffusers/pull/3778 ?

sayakpaul commented 1 year ago

I originally mentioned to replace 'pipeline.load_lora_weights' for loading lora. The usage is 'pipe = load_lora_weights(pipe, lora_path, 1.0, 'cuda', torch.float32)', so you can make a comparison. Based on my previous experience, the two should not be consistent, and the version of @pdoane is the expected result, consistent with A111.

@icech I am trying to understand this better and would appreciate your inputs here. If I do pipeline.load_lora_weights(".", weight_name="vividWatercolors_10.safetensors") (as done in my Colab), it actually uses "cuda" along with a LoRA scale of 1. Is it not what you used in your experiments too?

Regardless, I will dive deeper into the loaded parameters and see what we're missing out on :)

icech commented 1 year ago

I originally mentioned to replace 'pipeline.load_lora_weights' for loading lora. The usage is 'pipe = load_lora_weights(pipe, lora_path, 1.0, 'cuda', torch.float32)', so you can make a comparison. Based on my previous experience, the two should not be consistent, and the version of @pdoane is the expected result, consistent with A111.

@icech I am trying to understand this better and would appreciate your inputs here. If I do pipeline.load_lora_weights(".", weight_name="vividWatercolors_10.safetensors") (as done in my Colab), it actually uses "cuda" along with a LoRA scale of 1. Is it not what you used in your experiments too?

Regardless, I will dive deeper into the loaded parameters and see what we're missing out on :)

This is same as my experiments. I will provide some images of my result tomorrow(about 10 hours later) for you to compare.

sayakpaul commented 1 year ago

Went deep into this issue.

TL;DR: With the current support for loading A1111 in Diffusers, we are unable to load certain keys, especially the ones containing 'mlp' and 'ff' in their keys. This is what is causing the differences in the quality of the generated outputs. I believe we'll be able to resolve this with https://github.com/huggingface/diffusers/pull/3756 which we're working on with @takuma104.

I was able to use @pdoane's script and generate the expected outputs. Check out this Colab. You'd notice that their method allows for merging all the weights where the current diffusers support doesn't allow that. We cannot go the merging way in diffusers as it doesn't allow for switching to a later attention processor easily. This is the primary reason. With https://github.com/huggingface/diffusers/pull/3756, this should be addressed and hopefully resolved.

To make this finding even more concrete, I prepared this script: https://gist.github.com/sayakpaul/c269da54270f6d866ef5acafd4bf8319. This shows us that, indeed, we're not loading all the keys and it's actually a known phenomenon.

Thanks for bringing this to our attention. And hopefully, we should be able to fix this soon.

Cc: @patrickvonplaten

CapsAdmin commented 1 year ago

I was interested in trying to add loras to diffusers and stumbled upoun this code snippet. It works for some loras but not all.

So I took some new code from https://github.com/AUTOMATIC1111/stable-diffusion-webui/pull/11821 which implements more lora support and converted it to work like the snippet above. So it merges the layers into pipeline as opposed to hooking onto torch forward functions.

The code can be found here: https://github.com/CapsAdmin/diffusers-a1111/blob/main/src/merge_lora_to_pipeline.py

I tested all the models mentioned in the a1111 pr, so it supports hada, ia3, lokr, "full" and lora. The script is self contained apart from importing "shared" which is just some dtype and device variables. So perhaps this is of interest to you @sayakpaul for testing on collab or something.

sayakpaul commented 1 year ago

@CapsAdmin thanks so much! Does that script work for SDXL LoRA checkpoints too? Maybe it would make sense if you created a converter space with your script to let people easily use it (like this one: https://huggingface.co/spaces/diffusers/sd-to-diffusers)?

So it merges the layers into pipeline as opposed to hooking onto torch forward functions.

Unfortunately, by our design, we're a bit hesitant to directly merge the weights into the concerned modules. So, we will have to think about it a bit.

CapsAdmin commented 1 year ago

@CapsAdmin thanks so much! Does that script work for SDXL LoRA checkpoints too?

I can try to get SDXL working, there was a very small amount of additional code that supposedly enabled it but I left it out to focus on getting it working with 1.5.

Maybe it would make sense if you created a converter space with your script to let people easily use it (like this one: https://huggingface.co/spaces/diffusers/sd-to-diffusers)?

It was just intended as something you'd plug into someone's diffusers backend, ie sdnext (the a1111 fork) is currently moving to diffusers but it currently does not support loading lora's the way the original backend does.

Since I'm merging this into the pipeline, I guess this is not far from a "merge loras into diffusers checkpoint" utility, but I don't really see the necessity for something like that. The other use case I intended for this is just that it could be an example/debug implementation for diffusers to do it properly.

So it merges the layers into pipeline as opposed to hooking onto torch forward functions.

Unfortunately, by our design, we're a bit hesitant to directly merge the weights into the concerned modules. So, we will have to think about it a bit.

When it comes to merging into the pipeline, I see pros and cons and I'm honestly not sure which is better. Keep in mind I'm not very versed in this space.

pros:

Can potentially be simpler
- if the api is explicit enough you could make it so you can only load loras and even embeddings in the same function where you create a pipeline from a checkpoint
Faster inference because you don't need to do any extra calculations

cons:

Increases loading time
Complicates things for a frontend that load the pipeline once when the checkpoint changes while the user expects being able to load and change weights of loras quickly
- It seems like all of the popular stable diffusion frontends right now assume you can load and unload loras on the fly
Changing the weight of a lora now needs to reload the entire pipeline

If you wanted to support loading and unloading on the fly there are ways to merge internally by keeping track of the changes a lora does to a pipeline, but this is very messy. Maybe you could even unload a lora by reversing the calculation, however with this method I would worry about losing precision.

sayakpaul commented 1 year ago

Hey @icech could you give https://github.com/huggingface/diffusers/pull/4147 a try?

Just install diffusers using pip install git+https://github.com/isidentical/diffusers@kohya-lora-aux-features.

Here I have hosted a couple of samples for you https://huggingface.co/datasets/sayakpaul/3725_test/.

Here's a side-by-side comparison:

Non-LoRA	LoRA

Let us know your findings!

Cc: @isidentical.

sayakpaul commented 1 year ago

@CapsAdmin also thanks for explaining this. We're trying to improve the support in https://github.com/huggingface/diffusers/pull/4147 thanks to @isidentical. Watch out :)

huggingface / diffusers

failed to use the feature of supporting for A1111 LoRA #3725