Loading .safetensors Lora

adhikjoshi commented 1 year ago

Describe the bug

I have downloaded lora from civitai which is in .safetensor format.

When i load it using below code,

pipe = StableDiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", torch_dtype=torch.float16) pipe.unet.load_attn_procs("lora.safetensors")

It throws error : KeyError: 'to_k_lora.down.weight'

File "/workspace/server/tasks.py", line 346, in txt2img self.pipe.unet.load_attn_procs(embd, use_safetensors=True) File "/opt/conda/envs/ldm/lib/python3.8/site-packages/diffusers/loaders.py", line 224, in load_attn_procs rank = value_dict["to_k_lora.down.weight"].shape[0] KeyError: 'to_k_lora.down.weight'

Reproduction

pipe = StableDiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", torch_dtype=torch.float16) pipe.unet.load_attn_procs("lora.safetensors")

Logs

No response

System Info

Diffusers Version: 0.15.0.dev0

patrickvonplaten commented 1 year ago

Hey @adhikjoshi,

Thanks for the issue we should indeed try to support also A1111 loading of LoRA tensors soon. cc @sayakpaul here

alejobrainz commented 1 year ago

Kohya-ss/sd-scripts has a nice mechanism for it, but it broke with 0.15, but you can for sure load A1111 LoRA Tensors with the function below on 0.14.0:

def apply_lora(pipe, lora_path, weight:float = 1.0):
    from safetensors.torch import load_file
    from sd-scripts.networks.lora import create_network_from_weights
    import torch

    vae = pipe.vae
    text_encoder = pipe.text_encoder
    unet = pipe.unet

    sd = load_file(lora_path)
    lora_network, sd = create_network_from_weights(weight, None, vae, text_encoder, unet, sd)
    lora_network.apply_to(text_encoder, unet)
    lora_network.load_state_dict(sd)
    lora_network.to("cuda", dtype=torch.float16)

but as of 0.15 it fails:

assert lora.lora_name not in names, f"duplicated lora name: {lora.lora_name}"
AssertionError: duplicated lora name: lora_unet_down_blocks_0_attentions_0_transformer_blocks_0_attn1_to_q

adhikjoshi commented 1 year ago

Kohya-ss/sd-scripts has a nice mechanism for it, but it broke with 0.15, but you can for sure load A1111 LoRA Tensors with the function below on 0.14.0:


def apply_lora(pipe, lora_path, weight:float = 1.0):

    from safetensors.torch import load_file

    from sd-scripts.networks.lora import create_network_from_weights

    import torch

    vae = pipe.vae

    text_encoder = pipe.text_encoder

    unet = pipe.unet

    sd = load_file(lora_path)

    lora_network, sd = create_network_from_weights(weight, None, vae, text_encoder, unet, sd)

    lora_network.apply_to(text_encoder, unet)

    lora_network.load_state_dict(sd)

    lora_network.to("cuda", dtype=torch.float16)

but as of 0.15 it fails:


assert lora.lora_name not in names, f"duplicated lora name: {lora.lora_name}"

AssertionError: duplicated lora name: lora_unet_down_blocks_0_attentions_0_transformer_blocks_0_attn1_to_q

CC @haofanwang @sayakpaul

sayakpaul commented 1 year ago

Can someone provide LoRA file in the A1111 format? Providing as many relevant details associated to the file as possible would be great too.

adhikjoshi commented 1 year ago

Can someone provide LoRA file in the A1111 format? Providing as many relevant details associated to the file as possible would be great too.

I have downloaded offset noise trained lora and uploaded its .safetensor on huggingface

https://huggingface.co/adhikjoshi/epi_noiseoffset

alejobrainz commented 1 year ago

@sayakpaul here you go. This Lora was trained using ss-kohya's scripts and works fine in A1111. I can load it on diffusers 0.14.0 with the snippet above using the lora.py from sd-scripts:

caAos-000001.zip

Thanks,

Alejandro.

sayakpaul commented 1 year ago

Cc: @patrickvonplaten ^

adhikjoshi commented 1 year ago

Here is function i made from convert_lora_safetensor_to_diffusers.py to load lora on inference time.

import torch
from safetensors.torch import load_file

def load_lora_weights(pipeline, checkpoint_path):
    # load base model
    pipeline.to("cuda")
    LORA_PREFIX_UNET = "lora_unet"
    LORA_PREFIX_TEXT_ENCODER = "lora_te"
    alpha = 0.75
    # load LoRA weight from .safetensors
    state_dict = load_file(checkpoint_path, device="cuda")
    visited = []

    # directly update weight in diffusers model
    for key in state_dict:
        # it is suggested to print out the key, it usually will be something like below
        # "lora_te_text_model_encoder_layers_0_self_attn_k_proj.lora_down.weight"

        # as we have set the alpha beforehand, so just skip
        if ".alpha" in key or key in visited:
            continue

        if "text" in key:
            layer_infos = key.split(".")[0].split(LORA_PREFIX_TEXT_ENCODER + "_")[-1].split("_")
            curr_layer = pipeline.text_encoder
        else:
            layer_infos = key.split(".")[0].split(LORA_PREFIX_UNET + "_")[-1].split("_")
            curr_layer = pipeline.unet

        # find the target layer
        temp_name = layer_infos.pop(0)
        while len(layer_infos) > -1:
            try:
                curr_layer = curr_layer.__getattr__(temp_name)
                if len(layer_infos) > 0:
                    temp_name = layer_infos.pop(0)
                elif len(layer_infos) == 0:
                    break
            except Exception:
                if len(temp_name) > 0:
                    temp_name += "_" + layer_infos.pop(0)
                else:
                    temp_name = layer_infos.pop(0)

        pair_keys = []
        if "lora_down" in key:
            pair_keys.append(key.replace("lora_down", "lora_up"))
            pair_keys.append(key)
        else:
            pair_keys.append(key)
            pair_keys.append(key.replace("lora_up", "lora_down"))

        # update weight
        if len(state_dict[pair_keys[0]].shape) == 4:
            weight_up = state_dict[pair_keys[0]].squeeze(3).squeeze(2).to(torch.float32)
            weight_down = state_dict[pair_keys[1]].squeeze(3).squeeze(2).to(torch.float32)
            curr_layer.weight.data += alpha * torch.mm(weight_up, weight_down).unsqueeze(2).unsqueeze(3)
        else:
            weight_up = state_dict[pair_keys[0]].to(torch.float32)
            weight_down = state_dict[pair_keys[1]].to(torch.float32)
            curr_layer.weight.data += alpha * torch.mm(weight_up, weight_down)

        # update visited list
        for item in pair_keys:
            visited.append(item)

    return pipeline

Can use it like,

lora_model = lora_models + "/" + opt.lora + ".safetensors"
self.pipe = load_lora_weights(self.pipe, lora_model)

@sayakpaul @patrickvonplaten

sayakpaul commented 1 year ago

Thanks! Do you have the checkpoints with which we could test this?

adhikjoshi commented 1 year ago

Can someone provide LoRA file in the A1111 format? Providing as many relevant details associated to the file as possible would be great too.

I have downloaded offset noise trained lora and uploaded its .safetensor on huggingface

https://huggingface.co/adhikjoshi/epi_noiseoffset

This uploaded safetensor lora and others work well

pdoane commented 1 year ago

Thanks @adhikjoshi! Getting a lot further with your function but the output is not matching what I would expect. As a first guess, I would think this is the alpha handling as that is hard-coded to 0.75 but the LoRA's I'm using have .alpha keys in them.

pdoane commented 1 year ago

I updated the function from @adhikjoshi to use the .alpha elements and also added a multiplier that can be used to weight the LoRA overall. Tested this on 4 random LoRAs I downloaded from CivitAI and it matches the output from Automatic1111:

def load_lora_weights(pipeline, checkpoint_path, multiplier, device, dtype):
    LORA_PREFIX_UNET = "lora_unet"
    LORA_PREFIX_TEXT_ENCODER = "lora_te"
    # load LoRA weight from .safetensors
    state_dict = load_file(checkpoint_path, device=device)

    updates = defaultdict(dict)
    for key, value in state_dict.items():
        # it is suggested to print out the key, it usually will be something like below
        # "lora_te_text_model_encoder_layers_0_self_attn_k_proj.lora_down.weight"

        layer, elem = key.split('.', 1)
        updates[layer][elem] = value

    # directly update weight in diffusers model
    for layer, elems in updates.items():

        if "text" in layer:
            layer_infos = layer.split(LORA_PREFIX_TEXT_ENCODER + "_")[-1].split("_")
            curr_layer = pipeline.text_encoder
        else:
            layer_infos = layer.split(LORA_PREFIX_UNET + "_")[-1].split("_")
            curr_layer = pipeline.unet

        # find the target layer
        temp_name = layer_infos.pop(0)
        while len(layer_infos) > -1:
            try:
                curr_layer = curr_layer.__getattr__(temp_name)
                if len(layer_infos) > 0:
                    temp_name = layer_infos.pop(0)
                elif len(layer_infos) == 0:
                    break
            except Exception:
                if len(temp_name) > 0:
                    temp_name += "_" + layer_infos.pop(0)
                else:
                    temp_name = layer_infos.pop(0)

        # get elements for this layer
        weight_up = elems['lora_up.weight'].to(dtype)
        weight_down = elems['lora_down.weight'].to(dtype)
        alpha = elems['alpha']
        if alpha:
            alpha = alpha.item() / weight_up.shape[1]
        else:
            alpha = 1.0

        # update weight
        if len(weight_up.shape) == 4:
            curr_layer.weight.data += multiplier * alpha * torch.mm(weight_up.squeeze(3).squeeze(2), weight_down.squeeze(3).squeeze(2)).unsqueeze(2).unsqueeze(3)
        else:
            curr_layer.weight.data += multiplier * alpha * torch.mm(weight_up, weight_down)

    return pipeline

Example usage:

pipe = load_lora_weights(pipe, lora_path, 1.0, 'cuda', torch.float32)

alejobrainz commented 1 year ago

I tested on several custom-created Loras, works great! Excellent work, @pdoane, thanks for sharing.

Quick question for the group. is there a way to quickly unload a Lora weight from a loaded pipeline? I want to maintain it in memory, and simply assign/remove Lora Embeddings on the fly after each inference. Any pointers are appreciated.

Thanks again!

Alejandro

pdoane commented 1 year ago

There are two options I can think of:

Layer updating is a linear operation so it can be reversed by passing in a negative multiplier. Because of floating-point rounding, there could be a gradual drift over time.
You can make a copy of the tensor for each modified layer and restore it later. As the LoRAs are small relative to the model, this is probably preferred (and I expect faster).

alejobrainz commented 1 year ago

I'll try approach #2

alejobrainz commented 1 year ago

Ugly, but worked for me. Tested making 600 inferences switching between 12 Loras safetensors 50 times on diffusers 0.15.1:

from safetensors.torch import load_file
from collections import defaultdict
from diffusers.loaders import LoraLoaderMixin
import torch 

current_pipeline = None
original_weights = {}

def load_lora_weights(pipeline, checkpoint_path, multiplier, device, dtype):
    global current_pipeline, original_weights

    if (pipeline != current_pipeline):
        backup = True
        current_pipeline = pipeline
        original_weights = {}    
    else:
        backup = False

    # load base model
    pipeline.to(device)
    LORA_PREFIX_UNET = "lora_unet"
    LORA_PREFIX_TEXT_ENCODER = "lora_te"
    # load LoRA weight from .safetensors
    state_dict = load_file(checkpoint_path, device=device)

    updates = defaultdict(dict)
    for key, value in state_dict.items():
        # it is suggested to print out the key, it usually will be something like below
        # "lora_te_text_model_encoder_layers_0_self_attn_k_proj.lora_down.weight"

        layer, elem = key.split('.', 1)
        updates[layer][elem] = value

    index = 0
    # directly update weight in diffusers model
    for layer, elems in updates.items():
        index += 1

        if "text" in layer:
            layer_infos = layer.split(LORA_PREFIX_TEXT_ENCODER + "_")[-1].split("_")
            curr_layer = pipeline.text_encoder
        else:
            layer_infos = layer.split(LORA_PREFIX_UNET + "_")[-1].split("_")
            curr_layer = pipeline.unet

        # find the target layer
        temp_name = layer_infos.pop(0)
        while len(layer_infos) > -1:
            try:
                curr_layer = curr_layer.__getattr__(temp_name)
                if len(layer_infos) > 0:
                    temp_name = layer_infos.pop(0)
                elif len(layer_infos) == 0:
                    break
            except Exception:
                if len(temp_name) > 0:
                    temp_name += "_" + layer_infos.pop(0)
                else:
                    temp_name = layer_infos.pop(0)

        # get elements for this layer
        weight_up = elems['lora_up.weight'].to(dtype)
        weight_down = elems['lora_down.weight'].to(dtype)
        alpha = elems['alpha']
        if alpha:
            alpha = alpha.item() / weight_up.shape[1]
        else:
            alpha = 1.0

        if (backup):
            original_weights[index] = curr_layer.weight.data.clone().detach()
        else:
            curr_layer.weight.data = original_weights[index].clone().detach()

        # update weight
        if len(weight_up.shape) == 4:
            curr_layer.weight.data += multiplier * alpha * torch.mm(weight_up.squeeze(3).squeeze(2), weight_down.squeeze(3).squeeze(2)).unsqueeze(2).unsqueeze(3)
        else:
            curr_layer.weight.data += multiplier * alpha * torch.mm(weight_up, weight_down)

    return pipeline

LoraLoaderMixin.load_lora_weights = load_lora_weights

sayakpaul commented 1 year ago

@pdoane thanks so much for your inputs and investigations!

Do you mind sharing the pipe and lora_path you tested https://github.com/huggingface/diffusers/issues/3064#issuecomment-1512429695 with?

pdoane commented 1 year ago

@sayakpaul - followed up in e-mail.

sayakpaul commented 1 year ago

Thanks. However, I think having an end-to-end open example here would help the community a great deal to understand the nuances of the interoperability in a better manner.

sayakpaul commented 1 year ago

@pdoane come to think of it, would you be interested to improve our LoRA functionality to operate with the A1111 format as well?

@patrickvonplaten recently incorporated similar support for our textual inversion scripts: https://github.com/huggingface/diffusers/tree/main/examples/textual_inversion

pdoane commented 1 year ago

My assumption is this is just the first step to getting something more official - would be glad to help!

I have some API questions about it:

Do you want support in the main API or as an example/converter?
Assuming it is in the main API, the existing method of unet.load_attn_procs() is not the right place as the text encoder needs modification as well.
Weight restoration is an important use case too, probably an optional dictionary parameter to store weight information and another method to re-apply.

In terms of format details:

The existing LoRA support has a different assumption for key names. I'm not sure what format is being assumed currently and also not sure how it should be reconciled with this approach. The A1111 code suggests that the layer name convention being used in the above scripts is "diffusers" and not "compvis". Are there LoRA files that use compvis layer names?
MultiheadAttention support is missing. Should be easy to add but I wanted to find an example first.
There are a variety of other formats too (e.g. LyCORIS) and I don't know how common those are.

alexblattner commented 1 year ago

@alejobrainz how do you use your code for it to work with a prompt in the same way as A111? I put this as prompt:

prompt="art by <lora:mngstle:1>"
n_prompt="(nsfw), out of frame, multiple people, petite, loli, side view, profile, lowres, (bad anatomy, bad hands:1.1), text, (tattoo), error, missing fingers, extra digit, fewer digits, cropped, worst quality, (((many people))), low quality, normal quality, jpeg artifacts, signature, watermark, username, blurry, artist name,weird colors, (cartoon, 3d, bad art, poorly drawn, close up, blurry:1.5), (disfigured, deformed, extra limbs:1.5)"

but it ignored the lora instructions completely.

alejobrainz commented 1 year ago

For prompt weighting you can use compel. It's great and easy to use. Just be sure to check out the syntax at https://github.com/damian0815/compel/blob/main/Reference.md

alejobrainz commented 1 year ago

Also, be mindful that the lora is embedded using the script. you only need the keyword your Lora uses within the prompt.

alexblattner commented 1 year ago

@alejobrainz

what would be the equivalent of (dog), (((cat)))?
is it possible to modify the word associated with the lora?
what would be the equivalent of just doing this "art by "?

I am used to use automatic1111 and am developing a new interface using diffusers

pdoane commented 1 year ago

We're getting off topic quickly here, but Compel and A1111 are not equivalent when it comes to prompt weighting. Roughly though, (dog) == dog+ and (((cat))) == cat+++. With Compel, parenthesis can be used for grouping but don't carry additional weighting information.

LoRAs modify the weights of the text encoder and unet, so they are not related to the prompt. It's a bit of a weird design decision that some tools expose LoRA usage through the prompt at all. Textual inversions do belong in the prompt, so that's likely where that comes from.

Depending on how far you want to go with your front-end, you are likely to run into some challenges. Advanced usages like LoRAs, textual inversion, ControlNet, and the integration of all these features together is rapidly progressing but may not be ready out of the box. What are the goals for your UI? I may be working in a similar space.

alexblattner commented 1 year ago

@pdoane I want to create a full fledge AI comics maker with consistent characters, styles, etc... I knew there would be challenges which is why I'm doing it.

In A1111, when you train a Lora and have [name] as input text it uses the name of the lora and recognizes it. I am trying to do the same

also, thanks for the first part, much appreciated!

pdoane commented 1 year ago

Compel can extract LoRAs referenced from prompts. It'll be up to you to take that information and modify the pipeline though.

alexblattner commented 1 year ago

@pdoane is there any tutorial that could point me in the right direction on that?

EDIT: here's the full code to achieve what I tried to do:

import torch
from safetensors.torch import load_file
from collections import defaultdict
def load_lora_weights(pipeline, checkpoint_path, multiplier, device, dtype):
    LORA_PREFIX_UNET = "lora_unet"
    LORA_PREFIX_TEXT_ENCODER = "lora_te"
    # load LoRA weight from .safetensors
    if isinstance(checkpoint_path, str):

        state_dict = load_file(checkpoint_path, device=device)

        updates = defaultdict(dict)
        for key, value in state_dict.items():
            # it is suggested to print out the key, it usually will be something like below
            # "lora_te_text_model_encoder_layers_0_self_attn_k_proj.lora_down.weight"

            layer, elem = key.split('.', 1)
            updates[layer][elem] = value

        # directly update weight in diffusers model
        for layer, elems in updates.items():

            if "text" in layer:
                layer_infos = layer.split(LORA_PREFIX_TEXT_ENCODER + "_")[-1].split("_")
                curr_layer = pipeline.text_encoder
            else:
                layer_infos = layer.split(LORA_PREFIX_UNET + "_")[-1].split("_")
                curr_layer = pipeline.unet

            # find the target layer
            temp_name = layer_infos.pop(0)
            while len(layer_infos) > -1:
                try:
                    curr_layer = curr_layer.__getattr__(temp_name)
                    if len(layer_infos) > 0:
                        temp_name = layer_infos.pop(0)
                    elif len(layer_infos) == 0:
                        break
                except Exception:
                    if len(temp_name) > 0:
                        temp_name += "_" + layer_infos.pop(0)
                    else:
                        temp_name = layer_infos.pop(0)

            # get elements for this layer
            weight_up = elems['lora_up.weight'].to(dtype)
            weight_down = elems['lora_down.weight'].to(dtype)
            alpha = elems['alpha']
            if alpha:
                alpha = alpha.item() / weight_up.shape[1]
            else:
                alpha = 1.0

            # update weight
            if len(weight_up.shape) == 4:
                curr_layer.weight.data += multiplier * alpha * torch.mm(weight_up.squeeze(3).squeeze(2), weight_down.squeeze(3).squeeze(2)).unsqueeze(2).unsqueeze(3)
            else:
                curr_layer.weight.data += multiplier * alpha * torch.mm(weight_up, weight_down)
    else:
        for ckptpath in checkpoint_path:
            state_dict = load_file(ckptpath, device=device)

            updates = defaultdict(dict)
            for key, value in state_dict.items():
                # it is suggested to print out the key, it usually will be something like below
                # "lora_te_text_model_encoder_layers_0_self_attn_k_proj.lora_down.weight"

                layer, elem = key.split('.', 1)
                updates[layer][elem] = value

            # directly update weight in diffusers model
            for layer, elems in updates.items():

                if "text" in layer:
                    layer_infos = layer.split(LORA_PREFIX_TEXT_ENCODER + "_")[-1].split("_")
                    curr_layer = pipeline.text_encoder
                else:
                    layer_infos = layer.split(LORA_PREFIX_UNET + "_")[-1].split("_")
                    curr_layer = pipeline.unet

                # find the target layer
                temp_name = layer_infos.pop(0)
                while len(layer_infos) > -1:
                    try:
                        curr_layer = curr_layer.__getattr__(temp_name)
                        if len(layer_infos) > 0:
                            temp_name = layer_infos.pop(0)
                        elif len(layer_infos) == 0:
                            break
                    except Exception:
                        if len(temp_name) > 0:
                            temp_name += "_" + layer_infos.pop(0)
                        else:
                            temp_name = layer_infos.pop(0)

                # get elements for this layer
                weight_up = elems['lora_up.weight'].to(dtype)
                weight_down = elems['lora_down.weight'].to(dtype)
                alpha = elems['alpha']
                if alpha:
                    alpha = alpha.item() / weight_up.shape[1]
                else:
                    alpha = 1.0

                # update weight
                if len(weight_up.shape) == 4:
                    curr_layer.weight.data += multiplier * alpha * torch.mm(weight_up.squeeze(3).squeeze(2), weight_down.squeeze(3).squeeze(2)).unsqueeze(2).unsqueeze(3)
                else:
                    curr_layer.weight.data += multiplier * alpha * torch.mm(weight_up, weight_down)
    return pipeline

import torch
from diffusers import StableDiffusionControlNetPipeline, ControlNetModel, UniPCMultistepScheduler,ModelMixin
from io import BytesIO
from PIL import Image
import torch.multiprocessing as mp
import Loras
controlnet = ControlNetModel.from_pretrained("lllyasviel/control_v11p_sd15_openpose", torch_dtype=torch.float16)
pipe = StableDiffusionControlNetPipeline.from_pretrained(
  "runwayml/stable-diffusion-v1-5", controlnet=controlnet, torch_dtype=torch.float16,safety_checker=None, requires_safety_checker=False,
).to("cuda")
pipe=Loras.load_lora_weights(pipe, ['mngstle.safetensors','galgadot.safetensors'],1.0,'cuda',torch.float16)
pipe.scheduler = UniPCMultistepScheduler.from_config(pipe.scheduler.config)

pipe.enable_xformers_memory_efficient_attention()
pipe.enable_model_cpu_offload()
buffer=open('gpose.png', 'rb')
buffer.seek(0)
image_bytes = buffer.read()
images = Image.open(BytesIO(image_bytes))
generator = torch.manual_seed(1)
prompt="withLora(galgadot,1), manga, intricate, sharp focus, illustration, highly detailed, digital painting, concept art, matte, masterpiece, 8k, art by withLora(mngstle,1), black and white, monochrome"
n_prompt="nsfw+, out of frame, multiple people, petite, loli, side view, profile, lowres, (bad anatomy, (bad hands)1.1)+, text, tattoo+, error, missing fingers, extra digit, fewer digits, cropped, worst quality, many people+++, low quality, normal quality, jpeg artifacts, signature, watermark, username, blurry, artist name,weird colors, (cartoon, 3d, bad art, poorly drawn, close up, (blurry)1.5)+, (disfigured, deformed, (extra limbs)1.5)+"
fimage=pipe(
    prompt,
    images,
    negative_prompt=n_prompt,
    num_inference_steps=20,
    generator=generator,
)
fimage = fimage.images[0]
fimage.save('result.png', format='PNG')

this is controlnet with 2 loras used

adhikjoshi commented 1 year ago

@pdoane is there any tutorial that could point me in the right direction on that?

EDIT: here's the full code to achieve what I tried to do:

import torch
from safetensors.torch import load_file
from collections import defaultdict
def load_lora_weights(pipeline, checkpoint_path, multiplier, device, dtype):
    LORA_PREFIX_UNET = "lora_unet"
    LORA_PREFIX_TEXT_ENCODER = "lora_te"
    # load LoRA weight from .safetensors
    if isinstance(checkpoint_path, str):

        state_dict = load_file(checkpoint_path, device=device)

        updates = defaultdict(dict)
        for key, value in state_dict.items():
            # it is suggested to print out the key, it usually will be something like below
            # "lora_te_text_model_encoder_layers_0_self_attn_k_proj.lora_down.weight"

            layer, elem = key.split('.', 1)
            updates[layer][elem] = value

        # directly update weight in diffusers model
        for layer, elems in updates.items():

            if "text" in layer:
                layer_infos = layer.split(LORA_PREFIX_TEXT_ENCODER + "_")[-1].split("_")
                curr_layer = pipeline.text_encoder
            else:
                layer_infos = layer.split(LORA_PREFIX_UNET + "_")[-1].split("_")
                curr_layer = pipeline.unet

            # find the target layer
            temp_name = layer_infos.pop(0)
            while len(layer_infos) > -1:
                try:
                    curr_layer = curr_layer.__getattr__(temp_name)
                    if len(layer_infos) > 0:
                        temp_name = layer_infos.pop(0)
                    elif len(layer_infos) == 0:
                        break
                except Exception:
                    if len(temp_name) > 0:
                        temp_name += "_" + layer_infos.pop(0)
                    else:
                        temp_name = layer_infos.pop(0)

            # get elements for this layer
            weight_up = elems['lora_up.weight'].to(dtype)
            weight_down = elems['lora_down.weight'].to(dtype)
            alpha = elems['alpha']
            if alpha:
                alpha = alpha.item() / weight_up.shape[1]
            else:
                alpha = 1.0

            # update weight
            if len(weight_up.shape) == 4:
                curr_layer.weight.data += multiplier * alpha * torch.mm(weight_up.squeeze(3).squeeze(2), weight_down.squeeze(3).squeeze(2)).unsqueeze(2).unsqueeze(3)
            else:
                curr_layer.weight.data += multiplier * alpha * torch.mm(weight_up, weight_down)
    else:
        for ckptpath in checkpoint_path:
            state_dict = load_file(ckptpath, device=device)

            updates = defaultdict(dict)
            for key, value in state_dict.items():
                # it is suggested to print out the key, it usually will be something like below
                # "lora_te_text_model_encoder_layers_0_self_attn_k_proj.lora_down.weight"

                layer, elem = key.split('.', 1)
                updates[layer][elem] = value

            # directly update weight in diffusers model
            for layer, elems in updates.items():

                if "text" in layer:
                    layer_infos = layer.split(LORA_PREFIX_TEXT_ENCODER + "_")[-1].split("_")
                    curr_layer = pipeline.text_encoder
                else:
                    layer_infos = layer.split(LORA_PREFIX_UNET + "_")[-1].split("_")
                    curr_layer = pipeline.unet

                # find the target layer
                temp_name = layer_infos.pop(0)
                while len(layer_infos) > -1:
                    try:
                        curr_layer = curr_layer.__getattr__(temp_name)
                        if len(layer_infos) > 0:
                            temp_name = layer_infos.pop(0)
                        elif len(layer_infos) == 0:
                            break
                    except Exception:
                        if len(temp_name) > 0:
                            temp_name += "_" + layer_infos.pop(0)
                        else:
                            temp_name = layer_infos.pop(0)

                # get elements for this layer
                weight_up = elems['lora_up.weight'].to(dtype)
                weight_down = elems['lora_down.weight'].to(dtype)
                alpha = elems['alpha']
                if alpha:
                    alpha = alpha.item() / weight_up.shape[1]
                else:
                    alpha = 1.0

                # update weight
                if len(weight_up.shape) == 4:
                    curr_layer.weight.data += multiplier * alpha * torch.mm(weight_up.squeeze(3).squeeze(2), weight_down.squeeze(3).squeeze(2)).unsqueeze(2).unsqueeze(3)
                else:
                    curr_layer.weight.data += multiplier * alpha * torch.mm(weight_up, weight_down)
    return pipeline

import torch
from diffusers import StableDiffusionControlNetPipeline, ControlNetModel, UniPCMultistepScheduler,ModelMixin
from io import BytesIO
from PIL import Image
import torch.multiprocessing as mp
import Loras
controlnet = ControlNetModel.from_pretrained("lllyasviel/control_v11p_sd15_openpose", torch_dtype=torch.float16)
pipe = StableDiffusionControlNetPipeline.from_pretrained(
  "runwayml/stable-diffusion-v1-5", controlnet=controlnet, torch_dtype=torch.float16,safety_checker=None, requires_safety_checker=False,
).to("cuda")
pipe=Loras.load_lora_weights(pipe, ['mngstle.safetensors','galgadot.safetensors'],1.0,'cuda',torch.float16)
pipe.scheduler = UniPCMultistepScheduler.from_config(pipe.scheduler.config)

pipe.enable_xformers_memory_efficient_attention()
pipe.enable_model_cpu_offload()
buffer=open('gpose.png', 'rb')
buffer.seek(0)
image_bytes = buffer.read()
images = Image.open(BytesIO(image_bytes))
generator = torch.manual_seed(1)
prompt="withLora(galgadot,1), manga, intricate, sharp focus, illustration, highly detailed, digital painting, concept art, matte, masterpiece, 8k, art by withLora(mngstle,1), black and white, monochrome"
n_prompt="nsfw+, out of frame, multiple people, petite, loli, side view, profile, lowres, (bad anatomy, (bad hands)1.1)+, text, tattoo+, error, missing fingers, extra digit, fewer digits, cropped, worst quality, many people+++, low quality, normal quality, jpeg artifacts, signature, watermark, username, blurry, artist name,weird colors, (cartoon, 3d, bad art, poorly drawn, close up, (blurry)1.5)+, (disfigured, deformed, (extra limbs)1.5)+"
fimage=pipe(
    prompt,
    images,
    negative_prompt=n_prompt,
    num_inference_steps=20,
    generator=generator,
)
fimage = fimage.images[0]
fimage.save('result.png', format='PNG')

this is controlnet with 2 loras used

How it would work with "(bad hands)1.1)+" or any weighted prompts?

alexblattner commented 1 year ago

@adhikjoshi it's already weighted. a single + is the same as (word) and wordNUM is the same as (word:NUM). The example essentially increased by 1.1 the weight and then with the auto value of +. so it made extra sure to make the bad hands not visible

pdoane commented 1 year ago

Diffusers does not apply prompt weighting to the prompt strings. You need to use prompt embeds as explained here:

https://huggingface.co/docs/diffusers/main/en/using-diffusers/weighted_prompts

Remember that LoRAs adjust weights on the model and do not relate to position or positive/negative aspects of the prompt, so this is the line that matters:

pipe=Loras.load_lora_weights(pipe, ['mngstle.safetensors','galgadot.safetensors'],1.0,'cuda',torch.float16)

Compel has a prompt parser that can provide the list of LoRAs as well as the prompt embeds for diffusers.

sayakpaul commented 1 year ago

@pdoane very sorry about the delay on our end in getting back to your initial comment: https://github.com/huggingface/diffusers/issues/3064#issuecomment-1514778052.

Assuming it is in the main API, the existing method of unet.load_attn_procs() is not the right place as the text encoder needs modification as well.

We have a load_lora_weights() utility which might be a better place.

Weight restoration is an important use case too, probably an optional dictionary parameter to store weight information and another method to re-apply.

Do you mean passing in an actual state dict to the loading function? If so, it's something we actually do. See the docstrings of load_lora_weights() here. We also try to infer if a certain state dict is from a different format other than what's followed in diffusers. But this currently only limited to textual inversion. More details here:

https://github.com/huggingface/diffusers/blob/fa31da29e591ed2e64a7c6ba9153c0b2e5a0ddc2/src/diffusers/loaders.py#L604

The existing LoRA support has a different assumption for key names. I'm not sure what format is being assumed currently and also not sure how it should be reconciled with this approach. The A1111 code suggests that the layer name convention being used in the above scripts is "diffusers" and not "compvis". Are there LoRA files that use compvis layer names?

Does the above para answer this question to some extent? For different layer names other than what is expected in diffusers we try to just amend them in a way that becomes compatible with diffusers afterward.

MultiheadAttention support is missing. Should be easy to add but I wanted to find an example first.

You mean LoRA being applied to multiple attention heads? I think in diffusers we already do that. See how we initialize the LoRA layers here:

https://github.com/huggingface/diffusers/blob/fa31da29e591ed2e64a7c6ba9153c0b2e5a0ddc2/examples/dreambooth/train_dreambooth_lora.py#L695

But @patrickvonplaten can provide more details.

There are a variety of other formats too (e.g. LyCORIS) and I don't know how common those are.

I think it's okay to focus just on A1111 format for now as those are the most used ones.

patrickvonplaten commented 1 year ago

Would be super nice if we could add loading functionality to load_lora_weights for A1111 format. Fully agree with @sayakpaul statement above

alexblattner commented 1 year ago

@patrickvonplaten how high would that be on your priority list? This would save so much time especially considering compel doesn't provide the same functionality that is already the dominant one. Thanks for putting it on your list btw, everyone really appreciates it

RustyKettle commented 1 year ago

I want to make sure I am using this correctly.

I would like to use Lykon/Dreamshaper for inpainting, so I need to load the safetensor weights at that location.

model = "Lykon/DreamShaper"
pipeline = StableDiffusionInpaintPipeline.from_pretrained(
    model,
    torch_dtype=torch.float16,
)
pipeline = pipeline.load_lora_weights(model, use_safetensors=True, weight_name="DreamShaper_5_beta2_BakedVae-inpainting.inpainting.safetensors")
pipeline = pipeline.to("cuda")

This gives the following error: ValueError: None does not seem to be in the correct format expected by LoRA or Custom Diffusion training.

My guess is that I'm calling it wrong.

pdoane commented 1 year ago

We have a load_lora_weights() utility which might be a better place.

Yes definitely! I saw that after writing the message. Maybe worth updating documentation to use it over the unet API?

Do you mean passing in an actual state dict to the loading function? If so, it's something we actually do. See the docstrings of load_lora_weights() [here]

I think that might be something different. I want to optimize changing the set of active LoRAs/weights. As the LoRAs typically only adjust a subset of tensors in the model, it would be reasonable to restore a model back to its original state and then apply a new set.

Let me know If someone is working in the problem space here. Otherwise I'll try to get some time for a PR in the next week or so.

RustyKettle commented 1 year ago

I want to make sure I am using this correctly.

I would like to use Lykon/Dreamshaper for inpainting, so I need to load the safetensor weights at that location.
model = "Lykon/DreamShaper"
pipeline = StableDiffusionInpaintPipeline.from_pretrained(
    model,
    torch_dtype=torch.float16,
)
pipeline = pipeline.load_lora_weights(model, use_safetensors=True, weight_name="DreamShaper_5_beta2_BakedVae-inpainting.inpainting.safetensors")
pipeline = pipeline.to("cuda")
This gives the following error: ValueError: None does not seem to be in the correct format expected by LoRA or Custom Diffusion training.

My guess is that I'm calling it wrong.

I'm still trying to figure out how to use other inpainting weights, and am struggling. There doesn't seem to be any resources on this. This thread seems like the right place to discuss this, but I might be wrong. Should I start a new ticket?

sayakpaul commented 1 year ago

Yes definitely! I saw that after writing the message. Maybe worth updating documentation to use it over the unet API?

For sure. This PR will hopefully fix it: https://github.com/huggingface/diffusers/pull/3180.

I think that might be something different. I want to optimize changing the set of active LoRAs/weights. As the LoRAs typically only adjust a subset of tensors in the model, it would be reasonable to restore a model back to its original state and then apply a new set.

I think we need to think a bit more about this design. I guess we have the following use-cases here.

Let's consider we have two LoRAs (LoRA A and LoRA B), each being trained on concepts A and B, respectively.

Users could combine both and do some weighting to let the user control the effect. This is being discussed in: https://github.com/huggingface/diffusers/issues/2613. The way this is usually done is we just take the LoRAs and merge them sequentially to the affected main model blocks. Here, for example, after merging a single LoRA, we just repeat the process with the other LoRA(s): https://github.com/huggingface/diffusers/blob/256e6960cbe8a6379ee396ca6317503a991b9bbe/scripts/convert_lora_safetensor_to_diffusers.py#L82

Instead of taking alpha, I think we just apply the respective scalar LoRA weight coefficients. @pacman100 has worked on this in https://github.com/huggingface/peft. So, tagging him to see if he has any additional insights.
Users could first load, say, LoRA A, perform some generations, and then load LoRA B and repeat.
- By default I think we should just clear the previously loaded LoRA to make room for the new LoRA being added.
- We provide APIs to the user to:
  
  it would be reasonable to restore a model back to its original state and then apply a new set.

Let me know If someone is working in the problem space here. Otherwise I'll try to get some time for a PR in the next week or so.

Feel free to start the PR, we're more than happy to help :)

Cc: @patrickvonplaten

sayakpaul commented 1 year ago

@RustyKettle, for https://github.com/huggingface/diffusers/issues/3064#issuecomment-1526885741, could you maybe open a new issue?

showxu commented 1 year ago

@sayakpaul Hi, I created a PR https://github.com/huggingface/diffusers/pull/3294 to fix this issue. Tested with 3 .safetensor lora with diffusers pipeline.

brrbrrry commented 1 year ago

Hi, has anyone created anything to load multiple safetensors and apply them to a prompt? Otherwise is there any way to interact with the webui because the stable diffusion directory is broken on m1/m2 and I am only able to use a custom pytorch standalone webui atm. I would love to generate images with txt2img.py but I can't find it in the webui directory and wouldn't know how to load it so is there any way to interact with the already loaded webui with a python script like img2txt.py without having to interact with the webui itself?

patrickvonplaten commented 1 year ago

takuma104 commented 1 year ago

I'm interested in this issue, and I've written own code for it. I believe that over 90% of the LoRA files in CivitAI can be supported using this code. Since it uses hooks, it might not be possible to merge to Diffusers as-is, but this implementation supports multiple LoRAs and dynamic attachment/detachment. I'll think about whether it's possible to create a version without hooks as well.

https://gist.github.com/takuma104/e38d683d72b1e448b8d9b3835f7cfa44

patrickvonplaten commented 1 year ago

Let's maybe try to prioritize this a bit. Being able to load LoRA layers from A1111 format is pretty important IMO. cc @sayakpaul do you want to tackle this? Otherwise happy to look into it

sayakpaul commented 1 year ago

Seems like there are some entanglements.

With #3294, I left my suggestions here and here.

Do we want to continue on top of #3294? But my concern is that it directly modifies the weights of the UNet which removes the flexibility part (see https://github.com/huggingface/diffusers/pull/3294#issuecomment-1537816327).

So, need to discuss the best course of action here.

@takuma104, if you want to happy to welcome a PR based on https://github.com/huggingface/diffusers/issues/3064#issuecomment-1538858776.

@patrickvonplaten what are your thoughts?

patrickvonplaten commented 1 year ago

I don't think we should continue on top of #3294 - there are too many fundamental changes. Instead it'd be nice to open a new PR that cleanly allows loading A1111 weights I believe

sayakpaul commented 1 year ago

@takuma104 do you have a sample A111 LoRA weight file and a prompt to test with? I would like to prioritise working on the A1111 LoRA support in diffusers. For a first PR, may not have every possible case figured out but we will see.

ghunkins commented 1 year ago

I think transforming the state_dict in the load_lora_weights function to the diffusers format if auto1111 format is found is the cleanest option here.

Looked into this, but stuck due to not being able to find documentation on the diffusers serialization format. auto1111 seems pretty straightforward thankfully, but unsure how to convert to diffusers.

For diffusers format, I used @sayakpaul's testing repo sayakpaul/dreambooth-text-encoder-test.

!wget 'https://huggingface.co/sayakpaul/dreambooth-text-encoder-test/resolve/main/pytorch_lora_weights.bin'

import torch
hf_lora = torch.load('pytorch_lora_weights.bin')

For auto1111 format, I used the most popular LoRA right now, MoXiN:

!wget 'https://civitai.com/api/download/models/14856' -O 'moxin.safetensors'

import safetensors
sd_lora = safetensors.torch.load_file('moxin.safetensors')

Some intro code adapted from #3294 below to play with the serialization and transformation. It is non-functional, as I don't fully understand the diffusers serialization format.

from typing import Dict

LORA_PREFIX_TEXT_ENCODER = 'lora_te'
LORA_PREFIX_UNET = 'lora_unet'

def convert_auto1111_to_diffusers_state_dict(state_dict: Dict):
    diffusers_state_dict = dict()
    for key in state_dict:
        # it is suggested to print out the key, it usually will be something like below
        # "lora_te_text_model_encoder_layers_0_self_attn_k_proj.lora_down.weight"

        # ignore alpha
        if ".alpha" in key:
            continue

        if "text" in key:
            layer_infos = key.split(".")[0].split(LORA_PREFIX_TEXT_ENCODER + "_")[-1].split("_")
            curr_layer = pipe.text_encoder
        else:
            layer_infos = key.split(".")[0].split(LORA_PREFIX_UNET + "_")[-1].split("_")
            curr_layer = pipe.unet

        # find the target layer
        transformed_name = []
        temp_name = layer_infos.pop(0)
        while len(layer_infos) > -1:
            try:
                curr_layer = curr_layer.__getattr__(temp_name)
                if len(layer_infos) > 0:
                    transformed_name.append(temp_name)
                    temp_name = layer_infos.pop(0)
                elif len(layer_infos) == 0:
                    transformed_name.append(temp_name)
                    break
            except Exception:
                if len(temp_name) > 0:
                    temp_name += "_" + layer_infos.pop(0)
                else:
                    transformed_name.append(temp_name)
                    temp_name = layer_infos.pop(0)

        # TODO: use the transformed name and key to
        # create the modified key
        transformed_key = '.'.join(transformed_name)

        # Needs more transformation, but I don't understand
        # the diffusers serialization well enough.
        # Example print:
        # up_blocks.3.attentions.1.transformer_blocks.0.attn1.to_k lora_down.weight
        print(transformed_key, '.'.join(key.split('.')[1:]))

        # ... once key is transformed, add to diffusers_state_dict
        diffusers_state_dict[transformed_key] = state_dict[key]

    return diffusers_state_dict

Another stickier consideration: the current monkey-patching of the text_encoder for LoRA doesn't allow for easy removal of a LoRA to restore the previous model.

Happy to jump in more, but would need some documentation on the LoRA serialization format.

sayakpaul commented 1 year ago

Looked into this, but stuck due to not being able to find documentation on the diffusers serialization format. auto1111 seems pretty straightforward thankfully, but unsure how to convert to diffusers.

Sorry for this. Maybe we should add a section here on the serialization format (cc: @patrickvonplaten).

But trying to unblock you here. Previously, we only used to support LoRA fine-tuning of the UNet. You can find such a LoRA param file here: https://huggingface.co/patrickvonplaten/lora_dreambooth_dog_example/blob/main/pytorch_lora_weights.bin

There, we used to initialize the LoRA layers and just save them as regular torch/safetensors state dicts:

https://github.com/huggingface/diffusers/blob/ed616bd8a8740927770eebe017aedb6204c6105f/examples/dreambooth/train_dreambooth_lora.py#L647-#L666

(Note that the above code snippet is from an earlier commit)

But now, the serialization format has changed with the introduction of save_lora_weights(). Now, to better distinguish between the UNet and the text params, we first initialize the LoRA layers like before. But during serialization, we follow:

https://github.com/huggingface/diffusers/blob/01c056f09441a8670d0a88f24e2d4fb4a2956ae8/src/diffusers/loaders.py#L1157-L1180

I hope this makes sense :-)

Let me know if anything is unclear.

huggingface / diffusers