Closed adhikjoshi closed 1 year ago
Hey @adhikjoshi,
Thanks for the issue we should indeed try to support also A1111 loading of LoRA tensors soon. cc @sayakpaul here
Kohya-ss/sd-scripts has a nice mechanism for it, but it broke with 0.15, but you can for sure load A1111 LoRA Tensors with the function below on 0.14.0:
def apply_lora(pipe, lora_path, weight:float = 1.0):
from safetensors.torch import load_file
from sd-scripts.networks.lora import create_network_from_weights
import torch
vae = pipe.vae
text_encoder = pipe.text_encoder
unet = pipe.unet
sd = load_file(lora_path)
lora_network, sd = create_network_from_weights(weight, None, vae, text_encoder, unet, sd)
lora_network.apply_to(text_encoder, unet)
lora_network.load_state_dict(sd)
lora_network.to("cuda", dtype=torch.float16)
but as of 0.15 it fails:
assert lora.lora_name not in names, f"duplicated lora name: {lora.lora_name}"
AssertionError: duplicated lora name: lora_unet_down_blocks_0_attentions_0_transformer_blocks_0_attn1_to_q
Kohya-ss/sd-scripts has a nice mechanism for it, but it broke with 0.15, but you can for sure load A1111 LoRA Tensors with the function below on 0.14.0:
def apply_lora(pipe, lora_path, weight:float = 1.0): from safetensors.torch import load_file from sd-scripts.networks.lora import create_network_from_weights import torch vae = pipe.vae text_encoder = pipe.text_encoder unet = pipe.unet sd = load_file(lora_path) lora_network, sd = create_network_from_weights(weight, None, vae, text_encoder, unet, sd) lora_network.apply_to(text_encoder, unet) lora_network.load_state_dict(sd) lora_network.to("cuda", dtype=torch.float16)
but as of 0.15 it fails:
assert lora.lora_name not in names, f"duplicated lora name: {lora.lora_name}" AssertionError: duplicated lora name: lora_unet_down_blocks_0_attentions_0_transformer_blocks_0_attn1_to_q
CC @haofanwang @sayakpaul
Can someone provide LoRA file in the A1111 format? Providing as many relevant details associated to the file as possible would be great too.
Can someone provide LoRA file in the A1111 format? Providing as many relevant details associated to the file as possible would be great too.
I have downloaded offset noise trained lora and uploaded its .safetensor on huggingface
@sayakpaul here you go. This Lora was trained using ss-kohya's scripts and works fine in A1111. I can load it on diffusers 0.14.0 with the snippet above using the lora.py from sd-scripts:
Thanks,
Alejandro.
Cc: @patrickvonplaten ^
Here is function i made from convert_lora_safetensor_to_diffusers.py to load lora on inference time.
import torch
from safetensors.torch import load_file
def load_lora_weights(pipeline, checkpoint_path):
# load base model
pipeline.to("cuda")
LORA_PREFIX_UNET = "lora_unet"
LORA_PREFIX_TEXT_ENCODER = "lora_te"
alpha = 0.75
# load LoRA weight from .safetensors
state_dict = load_file(checkpoint_path, device="cuda")
visited = []
# directly update weight in diffusers model
for key in state_dict:
# it is suggested to print out the key, it usually will be something like below
# "lora_te_text_model_encoder_layers_0_self_attn_k_proj.lora_down.weight"
# as we have set the alpha beforehand, so just skip
if ".alpha" in key or key in visited:
continue
if "text" in key:
layer_infos = key.split(".")[0].split(LORA_PREFIX_TEXT_ENCODER + "_")[-1].split("_")
curr_layer = pipeline.text_encoder
else:
layer_infos = key.split(".")[0].split(LORA_PREFIX_UNET + "_")[-1].split("_")
curr_layer = pipeline.unet
# find the target layer
temp_name = layer_infos.pop(0)
while len(layer_infos) > -1:
try:
curr_layer = curr_layer.__getattr__(temp_name)
if len(layer_infos) > 0:
temp_name = layer_infos.pop(0)
elif len(layer_infos) == 0:
break
except Exception:
if len(temp_name) > 0:
temp_name += "_" + layer_infos.pop(0)
else:
temp_name = layer_infos.pop(0)
pair_keys = []
if "lora_down" in key:
pair_keys.append(key.replace("lora_down", "lora_up"))
pair_keys.append(key)
else:
pair_keys.append(key)
pair_keys.append(key.replace("lora_up", "lora_down"))
# update weight
if len(state_dict[pair_keys[0]].shape) == 4:
weight_up = state_dict[pair_keys[0]].squeeze(3).squeeze(2).to(torch.float32)
weight_down = state_dict[pair_keys[1]].squeeze(3).squeeze(2).to(torch.float32)
curr_layer.weight.data += alpha * torch.mm(weight_up, weight_down).unsqueeze(2).unsqueeze(3)
else:
weight_up = state_dict[pair_keys[0]].to(torch.float32)
weight_down = state_dict[pair_keys[1]].to(torch.float32)
curr_layer.weight.data += alpha * torch.mm(weight_up, weight_down)
# update visited list
for item in pair_keys:
visited.append(item)
return pipeline
Can use it like,
lora_model = lora_models + "/" + opt.lora + ".safetensors"
self.pipe = load_lora_weights(self.pipe, lora_model)
@sayakpaul @patrickvonplaten
Thanks! Do you have the checkpoints with which we could test this?
Can someone provide LoRA file in the A1111 format? Providing as many relevant details associated to the file as possible would be great too.
I have downloaded offset noise trained lora and uploaded its .safetensor on huggingface
This uploaded safetensor lora and others work well
Thanks @adhikjoshi! Getting a lot further with your function but the output is not matching what I would expect. As a first guess, I would think this is the alpha handling as that is hard-coded to 0.75 but the LoRA's I'm using have .alpha keys in them.
I updated the function from @adhikjoshi to use the .alpha elements and also added a multiplier that can be used to weight the LoRA overall. Tested this on 4 random LoRAs I downloaded from CivitAI and it matches the output from Automatic1111:
def load_lora_weights(pipeline, checkpoint_path, multiplier, device, dtype):
LORA_PREFIX_UNET = "lora_unet"
LORA_PREFIX_TEXT_ENCODER = "lora_te"
# load LoRA weight from .safetensors
state_dict = load_file(checkpoint_path, device=device)
updates = defaultdict(dict)
for key, value in state_dict.items():
# it is suggested to print out the key, it usually will be something like below
# "lora_te_text_model_encoder_layers_0_self_attn_k_proj.lora_down.weight"
layer, elem = key.split('.', 1)
updates[layer][elem] = value
# directly update weight in diffusers model
for layer, elems in updates.items():
if "text" in layer:
layer_infos = layer.split(LORA_PREFIX_TEXT_ENCODER + "_")[-1].split("_")
curr_layer = pipeline.text_encoder
else:
layer_infos = layer.split(LORA_PREFIX_UNET + "_")[-1].split("_")
curr_layer = pipeline.unet
# find the target layer
temp_name = layer_infos.pop(0)
while len(layer_infos) > -1:
try:
curr_layer = curr_layer.__getattr__(temp_name)
if len(layer_infos) > 0:
temp_name = layer_infos.pop(0)
elif len(layer_infos) == 0:
break
except Exception:
if len(temp_name) > 0:
temp_name += "_" + layer_infos.pop(0)
else:
temp_name = layer_infos.pop(0)
# get elements for this layer
weight_up = elems['lora_up.weight'].to(dtype)
weight_down = elems['lora_down.weight'].to(dtype)
alpha = elems['alpha']
if alpha:
alpha = alpha.item() / weight_up.shape[1]
else:
alpha = 1.0
# update weight
if len(weight_up.shape) == 4:
curr_layer.weight.data += multiplier * alpha * torch.mm(weight_up.squeeze(3).squeeze(2), weight_down.squeeze(3).squeeze(2)).unsqueeze(2).unsqueeze(3)
else:
curr_layer.weight.data += multiplier * alpha * torch.mm(weight_up, weight_down)
return pipeline
Example usage:
pipe = load_lora_weights(pipe, lora_path, 1.0, 'cuda', torch.float32)
I tested on several custom-created Loras, works great! Excellent work, @pdoane, thanks for sharing.
Quick question for the group. is there a way to quickly unload a Lora weight from a loaded pipeline? I want to maintain it in memory, and simply assign/remove Lora Embeddings on the fly after each inference. Any pointers are appreciated.
Thanks again!
Alejandro
There are two options I can think of:
Layer updating is a linear operation so it can be reversed by passing in a negative multiplier. Because of floating-point rounding, there could be a gradual drift over time.
You can make a copy of the tensor for each modified layer and restore it later. As the LoRAs are small relative to the model, this is probably preferred (and I expect faster).
I'll try approach #2
Ugly, but worked for me. Tested making 600 inferences switching between 12 Loras safetensors 50 times on diffusers 0.15.1:
from safetensors.torch import load_file
from collections import defaultdict
from diffusers.loaders import LoraLoaderMixin
import torch
current_pipeline = None
original_weights = {}
def load_lora_weights(pipeline, checkpoint_path, multiplier, device, dtype):
global current_pipeline, original_weights
if (pipeline != current_pipeline):
backup = True
current_pipeline = pipeline
original_weights = {}
else:
backup = False
# load base model
pipeline.to(device)
LORA_PREFIX_UNET = "lora_unet"
LORA_PREFIX_TEXT_ENCODER = "lora_te"
# load LoRA weight from .safetensors
state_dict = load_file(checkpoint_path, device=device)
updates = defaultdict(dict)
for key, value in state_dict.items():
# it is suggested to print out the key, it usually will be something like below
# "lora_te_text_model_encoder_layers_0_self_attn_k_proj.lora_down.weight"
layer, elem = key.split('.', 1)
updates[layer][elem] = value
index = 0
# directly update weight in diffusers model
for layer, elems in updates.items():
index += 1
if "text" in layer:
layer_infos = layer.split(LORA_PREFIX_TEXT_ENCODER + "_")[-1].split("_")
curr_layer = pipeline.text_encoder
else:
layer_infos = layer.split(LORA_PREFIX_UNET + "_")[-1].split("_")
curr_layer = pipeline.unet
# find the target layer
temp_name = layer_infos.pop(0)
while len(layer_infos) > -1:
try:
curr_layer = curr_layer.__getattr__(temp_name)
if len(layer_infos) > 0:
temp_name = layer_infos.pop(0)
elif len(layer_infos) == 0:
break
except Exception:
if len(temp_name) > 0:
temp_name += "_" + layer_infos.pop(0)
else:
temp_name = layer_infos.pop(0)
# get elements for this layer
weight_up = elems['lora_up.weight'].to(dtype)
weight_down = elems['lora_down.weight'].to(dtype)
alpha = elems['alpha']
if alpha:
alpha = alpha.item() / weight_up.shape[1]
else:
alpha = 1.0
if (backup):
original_weights[index] = curr_layer.weight.data.clone().detach()
else:
curr_layer.weight.data = original_weights[index].clone().detach()
# update weight
if len(weight_up.shape) == 4:
curr_layer.weight.data += multiplier * alpha * torch.mm(weight_up.squeeze(3).squeeze(2), weight_down.squeeze(3).squeeze(2)).unsqueeze(2).unsqueeze(3)
else:
curr_layer.weight.data += multiplier * alpha * torch.mm(weight_up, weight_down)
return pipeline
LoraLoaderMixin.load_lora_weights = load_lora_weights
@pdoane thanks so much for your inputs and investigations!
Do you mind sharing the pipe
and lora_path
you tested https://github.com/huggingface/diffusers/issues/3064#issuecomment-1512429695 with?
@sayakpaul - followed up in e-mail.
Thanks. However, I think having an end-to-end open example here would help the community a great deal to understand the nuances of the interoperability in a better manner.
@pdoane come to think of it, would you be interested to improve our LoRA functionality to operate with the A1111 format as well?
@patrickvonplaten recently incorporated similar support for our textual inversion scripts: https://github.com/huggingface/diffusers/tree/main/examples/textual_inversion
My assumption is this is just the first step to getting something more official - would be glad to help!
I have some API questions about it:
In terms of format details:
@alejobrainz how do you use your code for it to work with a prompt in the same way as A111? I put this as prompt:
prompt="art by <lora:mngstle:1>"
n_prompt="(nsfw), out of frame, multiple people, petite, loli, side view, profile, lowres, (bad anatomy, bad hands:1.1), text, (tattoo), error, missing fingers, extra digit, fewer digits, cropped, worst quality, (((many people))), low quality, normal quality, jpeg artifacts, signature, watermark, username, blurry, artist name,weird colors, (cartoon, 3d, bad art, poorly drawn, close up, blurry:1.5), (disfigured, deformed, extra limbs:1.5)"
but it ignored the lora instructions completely.
For prompt weighting you can use compel. It's great and easy to use. Just be sure to check out the syntax at https://github.com/damian0815/compel/blob/main/Reference.md
Also, be mindful that the lora is embedded using the script. you only need the keyword your Lora uses within the prompt.
@alejobrainz
I am used to use automatic1111 and am developing a new interface using diffusers
We're getting off topic quickly here, but Compel and A1111 are not equivalent when it comes to prompt weighting. Roughly though, (dog)
== dog+
and (((cat)))
== cat+++
. With Compel, parenthesis can be used for grouping but don't carry additional weighting information.
LoRAs modify the weights of the text encoder and unet, so they are not related to the prompt. It's a bit of a weird design decision that some tools expose LoRA usage through the prompt at all. Textual inversions do belong in the prompt, so that's likely where that comes from.
Depending on how far you want to go with your front-end, you are likely to run into some challenges. Advanced usages like LoRAs, textual inversion, ControlNet, and the integration of all these features together is rapidly progressing but may not be ready out of the box. What are the goals for your UI? I may be working in a similar space.
@pdoane I want to create a full fledge AI comics maker with consistent characters, styles, etc... I knew there would be challenges which is why I'm doing it.
In A1111, when you train a Lora and have [name] as input text it uses the name of the lora and recognizes it. I am trying to do the same
also, thanks for the first part, much appreciated!
Compel can extract LoRAs referenced from prompts. It'll be up to you to take that information and modify the pipeline though.
@pdoane is there any tutorial that could point me in the right direction on that?
EDIT: here's the full code to achieve what I tried to do:
import torch
from safetensors.torch import load_file
from collections import defaultdict
def load_lora_weights(pipeline, checkpoint_path, multiplier, device, dtype):
LORA_PREFIX_UNET = "lora_unet"
LORA_PREFIX_TEXT_ENCODER = "lora_te"
# load LoRA weight from .safetensors
if isinstance(checkpoint_path, str):
state_dict = load_file(checkpoint_path, device=device)
updates = defaultdict(dict)
for key, value in state_dict.items():
# it is suggested to print out the key, it usually will be something like below
# "lora_te_text_model_encoder_layers_0_self_attn_k_proj.lora_down.weight"
layer, elem = key.split('.', 1)
updates[layer][elem] = value
# directly update weight in diffusers model
for layer, elems in updates.items():
if "text" in layer:
layer_infos = layer.split(LORA_PREFIX_TEXT_ENCODER + "_")[-1].split("_")
curr_layer = pipeline.text_encoder
else:
layer_infos = layer.split(LORA_PREFIX_UNET + "_")[-1].split("_")
curr_layer = pipeline.unet
# find the target layer
temp_name = layer_infos.pop(0)
while len(layer_infos) > -1:
try:
curr_layer = curr_layer.__getattr__(temp_name)
if len(layer_infos) > 0:
temp_name = layer_infos.pop(0)
elif len(layer_infos) == 0:
break
except Exception:
if len(temp_name) > 0:
temp_name += "_" + layer_infos.pop(0)
else:
temp_name = layer_infos.pop(0)
# get elements for this layer
weight_up = elems['lora_up.weight'].to(dtype)
weight_down = elems['lora_down.weight'].to(dtype)
alpha = elems['alpha']
if alpha:
alpha = alpha.item() / weight_up.shape[1]
else:
alpha = 1.0
# update weight
if len(weight_up.shape) == 4:
curr_layer.weight.data += multiplier * alpha * torch.mm(weight_up.squeeze(3).squeeze(2), weight_down.squeeze(3).squeeze(2)).unsqueeze(2).unsqueeze(3)
else:
curr_layer.weight.data += multiplier * alpha * torch.mm(weight_up, weight_down)
else:
for ckptpath in checkpoint_path:
state_dict = load_file(ckptpath, device=device)
updates = defaultdict(dict)
for key, value in state_dict.items():
# it is suggested to print out the key, it usually will be something like below
# "lora_te_text_model_encoder_layers_0_self_attn_k_proj.lora_down.weight"
layer, elem = key.split('.', 1)
updates[layer][elem] = value
# directly update weight in diffusers model
for layer, elems in updates.items():
if "text" in layer:
layer_infos = layer.split(LORA_PREFIX_TEXT_ENCODER + "_")[-1].split("_")
curr_layer = pipeline.text_encoder
else:
layer_infos = layer.split(LORA_PREFIX_UNET + "_")[-1].split("_")
curr_layer = pipeline.unet
# find the target layer
temp_name = layer_infos.pop(0)
while len(layer_infos) > -1:
try:
curr_layer = curr_layer.__getattr__(temp_name)
if len(layer_infos) > 0:
temp_name = layer_infos.pop(0)
elif len(layer_infos) == 0:
break
except Exception:
if len(temp_name) > 0:
temp_name += "_" + layer_infos.pop(0)
else:
temp_name = layer_infos.pop(0)
# get elements for this layer
weight_up = elems['lora_up.weight'].to(dtype)
weight_down = elems['lora_down.weight'].to(dtype)
alpha = elems['alpha']
if alpha:
alpha = alpha.item() / weight_up.shape[1]
else:
alpha = 1.0
# update weight
if len(weight_up.shape) == 4:
curr_layer.weight.data += multiplier * alpha * torch.mm(weight_up.squeeze(3).squeeze(2), weight_down.squeeze(3).squeeze(2)).unsqueeze(2).unsqueeze(3)
else:
curr_layer.weight.data += multiplier * alpha * torch.mm(weight_up, weight_down)
return pipeline
import torch
from diffusers import StableDiffusionControlNetPipeline, ControlNetModel, UniPCMultistepScheduler,ModelMixin
from io import BytesIO
from PIL import Image
import torch.multiprocessing as mp
import Loras
controlnet = ControlNetModel.from_pretrained("lllyasviel/control_v11p_sd15_openpose", torch_dtype=torch.float16)
pipe = StableDiffusionControlNetPipeline.from_pretrained(
"runwayml/stable-diffusion-v1-5", controlnet=controlnet, torch_dtype=torch.float16,safety_checker=None, requires_safety_checker=False,
).to("cuda")
pipe=Loras.load_lora_weights(pipe, ['mngstle.safetensors','galgadot.safetensors'],1.0,'cuda',torch.float16)
pipe.scheduler = UniPCMultistepScheduler.from_config(pipe.scheduler.config)
pipe.enable_xformers_memory_efficient_attention()
pipe.enable_model_cpu_offload()
buffer=open('gpose.png', 'rb')
buffer.seek(0)
image_bytes = buffer.read()
images = Image.open(BytesIO(image_bytes))
generator = torch.manual_seed(1)
prompt="withLora(galgadot,1), manga, intricate, sharp focus, illustration, highly detailed, digital painting, concept art, matte, masterpiece, 8k, art by withLora(mngstle,1), black and white, monochrome"
n_prompt="nsfw+, out of frame, multiple people, petite, loli, side view, profile, lowres, (bad anatomy, (bad hands)1.1)+, text, tattoo+, error, missing fingers, extra digit, fewer digits, cropped, worst quality, many people+++, low quality, normal quality, jpeg artifacts, signature, watermark, username, blurry, artist name,weird colors, (cartoon, 3d, bad art, poorly drawn, close up, (blurry)1.5)+, (disfigured, deformed, (extra limbs)1.5)+"
fimage=pipe(
prompt,
images,
negative_prompt=n_prompt,
num_inference_steps=20,
generator=generator,
)
fimage = fimage.images[0]
fimage.save('result.png', format='PNG')
this is controlnet with 2 loras used
@pdoane is there any tutorial that could point me in the right direction on that?
EDIT: here's the full code to achieve what I tried to do:
import torch from safetensors.torch import load_file from collections import defaultdict def load_lora_weights(pipeline, checkpoint_path, multiplier, device, dtype): LORA_PREFIX_UNET = "lora_unet" LORA_PREFIX_TEXT_ENCODER = "lora_te" # load LoRA weight from .safetensors if isinstance(checkpoint_path, str): state_dict = load_file(checkpoint_path, device=device) updates = defaultdict(dict) for key, value in state_dict.items(): # it is suggested to print out the key, it usually will be something like below # "lora_te_text_model_encoder_layers_0_self_attn_k_proj.lora_down.weight" layer, elem = key.split('.', 1) updates[layer][elem] = value # directly update weight in diffusers model for layer, elems in updates.items(): if "text" in layer: layer_infos = layer.split(LORA_PREFIX_TEXT_ENCODER + "_")[-1].split("_") curr_layer = pipeline.text_encoder else: layer_infos = layer.split(LORA_PREFIX_UNET + "_")[-1].split("_") curr_layer = pipeline.unet # find the target layer temp_name = layer_infos.pop(0) while len(layer_infos) > -1: try: curr_layer = curr_layer.__getattr__(temp_name) if len(layer_infos) > 0: temp_name = layer_infos.pop(0) elif len(layer_infos) == 0: break except Exception: if len(temp_name) > 0: temp_name += "_" + layer_infos.pop(0) else: temp_name = layer_infos.pop(0) # get elements for this layer weight_up = elems['lora_up.weight'].to(dtype) weight_down = elems['lora_down.weight'].to(dtype) alpha = elems['alpha'] if alpha: alpha = alpha.item() / weight_up.shape[1] else: alpha = 1.0 # update weight if len(weight_up.shape) == 4: curr_layer.weight.data += multiplier * alpha * torch.mm(weight_up.squeeze(3).squeeze(2), weight_down.squeeze(3).squeeze(2)).unsqueeze(2).unsqueeze(3) else: curr_layer.weight.data += multiplier * alpha * torch.mm(weight_up, weight_down) else: for ckptpath in checkpoint_path: state_dict = load_file(ckptpath, device=device) updates = defaultdict(dict) for key, value in state_dict.items(): # it is suggested to print out the key, it usually will be something like below # "lora_te_text_model_encoder_layers_0_self_attn_k_proj.lora_down.weight" layer, elem = key.split('.', 1) updates[layer][elem] = value # directly update weight in diffusers model for layer, elems in updates.items(): if "text" in layer: layer_infos = layer.split(LORA_PREFIX_TEXT_ENCODER + "_")[-1].split("_") curr_layer = pipeline.text_encoder else: layer_infos = layer.split(LORA_PREFIX_UNET + "_")[-1].split("_") curr_layer = pipeline.unet # find the target layer temp_name = layer_infos.pop(0) while len(layer_infos) > -1: try: curr_layer = curr_layer.__getattr__(temp_name) if len(layer_infos) > 0: temp_name = layer_infos.pop(0) elif len(layer_infos) == 0: break except Exception: if len(temp_name) > 0: temp_name += "_" + layer_infos.pop(0) else: temp_name = layer_infos.pop(0) # get elements for this layer weight_up = elems['lora_up.weight'].to(dtype) weight_down = elems['lora_down.weight'].to(dtype) alpha = elems['alpha'] if alpha: alpha = alpha.item() / weight_up.shape[1] else: alpha = 1.0 # update weight if len(weight_up.shape) == 4: curr_layer.weight.data += multiplier * alpha * torch.mm(weight_up.squeeze(3).squeeze(2), weight_down.squeeze(3).squeeze(2)).unsqueeze(2).unsqueeze(3) else: curr_layer.weight.data += multiplier * alpha * torch.mm(weight_up, weight_down) return pipeline import torch from diffusers import StableDiffusionControlNetPipeline, ControlNetModel, UniPCMultistepScheduler,ModelMixin from io import BytesIO from PIL import Image import torch.multiprocessing as mp import Loras controlnet = ControlNetModel.from_pretrained("lllyasviel/control_v11p_sd15_openpose", torch_dtype=torch.float16) pipe = StableDiffusionControlNetPipeline.from_pretrained( "runwayml/stable-diffusion-v1-5", controlnet=controlnet, torch_dtype=torch.float16,safety_checker=None, requires_safety_checker=False, ).to("cuda") pipe=Loras.load_lora_weights(pipe, ['mngstle.safetensors','galgadot.safetensors'],1.0,'cuda',torch.float16) pipe.scheduler = UniPCMultistepScheduler.from_config(pipe.scheduler.config) pipe.enable_xformers_memory_efficient_attention() pipe.enable_model_cpu_offload() buffer=open('gpose.png', 'rb') buffer.seek(0) image_bytes = buffer.read() images = Image.open(BytesIO(image_bytes)) generator = torch.manual_seed(1) prompt="withLora(galgadot,1), manga, intricate, sharp focus, illustration, highly detailed, digital painting, concept art, matte, masterpiece, 8k, art by withLora(mngstle,1), black and white, monochrome" n_prompt="nsfw+, out of frame, multiple people, petite, loli, side view, profile, lowres, (bad anatomy, (bad hands)1.1)+, text, tattoo+, error, missing fingers, extra digit, fewer digits, cropped, worst quality, many people+++, low quality, normal quality, jpeg artifacts, signature, watermark, username, blurry, artist name,weird colors, (cartoon, 3d, bad art, poorly drawn, close up, (blurry)1.5)+, (disfigured, deformed, (extra limbs)1.5)+" fimage=pipe( prompt, images, negative_prompt=n_prompt, num_inference_steps=20, generator=generator, ) fimage = fimage.images[0] fimage.save('result.png', format='PNG')
this is controlnet with 2 loras used
How it would work with "(bad hands)1.1)+" or any weighted prompts?
@adhikjoshi it's already weighted. a single + is the same as (word) and wordNUM is the same as (word:NUM). The example essentially increased by 1.1 the weight and then with the auto value of +. so it made extra sure to make the bad hands not visible
Diffusers does not apply prompt weighting to the prompt strings. You need to use prompt embeds as explained here:
https://huggingface.co/docs/diffusers/main/en/using-diffusers/weighted_prompts
Remember that LoRAs adjust weights on the model and do not relate to position or positive/negative aspects of the prompt, so this is the line that matters:
pipe=Loras.load_lora_weights(pipe, ['mngstle.safetensors','galgadot.safetensors'],1.0,'cuda',torch.float16)
Compel has a prompt parser that can provide the list of LoRAs as well as the prompt embeds for diffusers.
@pdoane very sorry about the delay on our end in getting back to your initial comment: https://github.com/huggingface/diffusers/issues/3064#issuecomment-1514778052.
Assuming it is in the main API, the existing method of unet.load_attn_procs() is not the right place as the text encoder needs modification as well.
We have a load_lora_weights()
utility which might be a better place.
Weight restoration is an important use case too, probably an optional dictionary parameter to store weight information and another method to re-apply.
Do you mean passing in an actual state dict to the loading function? If so, it's something we actually do. See the docstrings of load_lora_weights()
here. We also try to infer if a certain state dict is from a different format other than what's followed in diffusers
. But this currently only limited to textual inversion. More details here:
The existing LoRA support has a different assumption for key names. I'm not sure what format is being assumed currently and also not sure how it should be reconciled with this approach. The A1111 code suggests that the layer name convention being used in the above scripts is "diffusers" and not "compvis". Are there LoRA files that use compvis layer names?
Does the above para answer this question to some extent? For different layer names other than what is expected in diffusers
we try to just amend them in a way that becomes compatible with diffusers
afterward.
MultiheadAttention support is missing. Should be easy to add but I wanted to find an example first.
You mean LoRA being applied to multiple attention heads? I think in diffusers
we already do that. See how we initialize the LoRA layers here:
But @patrickvonplaten can provide more details.
There are a variety of other formats too (e.g. LyCORIS) and I don't know how common those are.
I think it's okay to focus just on A1111 format for now as those are the most used ones.
Would be super nice if we could add loading functionality to load_lora_weights
for A1111 format. Fully agree with @sayakpaul statement above
@patrickvonplaten how high would that be on your priority list? This would save so much time especially considering compel doesn't provide the same functionality that is already the dominant one. Thanks for putting it on your list btw, everyone really appreciates it
I want to make sure I am using this correctly.
I would like to use Lykon/Dreamshaper for inpainting, so I need to load the safetensor weights at that location.
model = "Lykon/DreamShaper"
pipeline = StableDiffusionInpaintPipeline.from_pretrained(
model,
torch_dtype=torch.float16,
)
pipeline = pipeline.load_lora_weights(model, use_safetensors=True, weight_name="DreamShaper_5_beta2_BakedVae-inpainting.inpainting.safetensors")
pipeline = pipeline.to("cuda")
This gives the following error:
ValueError: None does not seem to be in the correct format expected by LoRA or Custom Diffusion training.
My guess is that I'm calling it wrong.
We have a load_lora_weights() utility which might be a better place.
Yes definitely! I saw that after writing the message. Maybe worth updating documentation to use it over the unet API?
Do you mean passing in an actual state dict to the loading function? If so, it's something we actually do. See the docstrings of load_lora_weights() [here]
I think that might be something different. I want to optimize changing the set of active LoRAs/weights. As the LoRAs typically only adjust a subset of tensors in the model, it would be reasonable to restore a model back to its original state and then apply a new set.
Let me know If someone is working in the problem space here. Otherwise I'll try to get some time for a PR in the next week or so.
I want to make sure I am using this correctly.
I would like to use Lykon/Dreamshaper for inpainting, so I need to load the safetensor weights at that location.
model = "Lykon/DreamShaper" pipeline = StableDiffusionInpaintPipeline.from_pretrained( model, torch_dtype=torch.float16, ) pipeline = pipeline.load_lora_weights(model, use_safetensors=True, weight_name="DreamShaper_5_beta2_BakedVae-inpainting.inpainting.safetensors") pipeline = pipeline.to("cuda")
This gives the following error:
ValueError: None does not seem to be in the correct format expected by LoRA or Custom Diffusion training.
My guess is that I'm calling it wrong.
I'm still trying to figure out how to use other inpainting weights, and am struggling. There doesn't seem to be any resources on this. This thread seems like the right place to discuss this, but I might be wrong. Should I start a new ticket?
Yes definitely! I saw that after writing the message. Maybe worth updating documentation to use it over the unet API?
For sure. This PR will hopefully fix it: https://github.com/huggingface/diffusers/pull/3180.
I think that might be something different. I want to optimize changing the set of active LoRAs/weights. As the LoRAs typically only adjust a subset of tensors in the model, it would be reasonable to restore a model back to its original state and then apply a new set.
I think we need to think a bit more about this design. I guess we have the following use-cases here.
Let's consider we have two LoRAs (LoRA A and LoRA B), each being trained on concepts A and B, respectively.
Users could combine both and do some weighting to let the user control the effect. This is being discussed in: https://github.com/huggingface/diffusers/issues/2613. The way this is usually done is we just take the LoRAs and merge them sequentially to the affected main model blocks. Here, for example, after merging a single LoRA, we just repeat the process with the other LoRA(s): https://github.com/huggingface/diffusers/blob/256e6960cbe8a6379ee396ca6317503a991b9bbe/scripts/convert_lora_safetensor_to_diffusers.py#L82
Instead of taking alpha
, I think we just apply the respective scalar LoRA weight coefficients. @pacman100 has worked on this in https://github.com/huggingface/peft. So, tagging him to see if he has any additional insights.
Users could first load, say, LoRA A, perform some generations, and then load LoRA B and repeat.
it would be reasonable to restore a model back to its original state and then apply a new set.
Let me know If someone is working in the problem space here. Otherwise I'll try to get some time for a PR in the next week or so.
Feel free to start the PR, we're more than happy to help :)
Cc: @patrickvonplaten
@RustyKettle, for https://github.com/huggingface/diffusers/issues/3064#issuecomment-1526885741, could you maybe open a new issue?
@sayakpaul Hi, I created a PR https://github.com/huggingface/diffusers/pull/3294 to fix this issue. Tested with 3 .safetensor lora with diffusers pipeline.
Hi, has anyone created anything to load multiple safetensors and apply them to a prompt? Otherwise is there any way to interact with the webui because the stable diffusion directory is broken on m1/m2 and I am only able to use a custom pytorch standalone webui atm. I would love to generate images with txt2img.py but I can't find it in the webui directory and wouldn't know how to load it so is there any way to interact with the already loaded webui with a python script like img2txt.py without having to interact with the webui itself?
I'm interested in this issue, and I've written own code for it. I believe that over 90% of the LoRA files in CivitAI can be supported using this code. Since it uses hooks, it might not be possible to merge to Diffusers as-is, but this implementation supports multiple LoRAs and dynamic attachment/detachment. I'll think about whether it's possible to create a version without hooks as well.
https://gist.github.com/takuma104/e38d683d72b1e448b8d9b3835f7cfa44
Let's maybe try to prioritize this a bit. Being able to load LoRA layers from A1111 format is pretty important IMO. cc @sayakpaul do you want to tackle this? Otherwise happy to look into it
Seems like there are some entanglements.
With #3294, I left my suggestions here and here.
Do we want to continue on top of #3294? But my concern is that it directly modifies the weights of the UNet which removes the flexibility part (see https://github.com/huggingface/diffusers/pull/3294#issuecomment-1537816327).
So, need to discuss the best course of action here.
@takuma104, if you want to happy to welcome a PR based on https://github.com/huggingface/diffusers/issues/3064#issuecomment-1538858776.
@patrickvonplaten what are your thoughts?
I don't think we should continue on top of #3294 - there are too many fundamental changes. Instead it'd be nice to open a new PR that cleanly allows loading A1111 weights I believe
@takuma104 do you have a sample A111 LoRA weight file and a prompt to test with? I would like to prioritise working on the A1111 LoRA support in diffusers. For a first PR, may not have every possible case figured out but we will see.
I think transforming the state_dict
in the load_lora_weights
function to the diffusers
format if auto1111
format is found is the cleanest option here.
Looked into this, but stuck due to not being able to find documentation on the diffusers
serialization format. auto1111
seems pretty straightforward thankfully, but unsure how to convert to diffusers
.
For diffusers format, I used @sayakpaul's testing repo sayakpaul/dreambooth-text-encoder-test
.
!wget 'https://huggingface.co/sayakpaul/dreambooth-text-encoder-test/resolve/main/pytorch_lora_weights.bin'
import torch
hf_lora = torch.load('pytorch_lora_weights.bin')
For auto1111 format, I used the most popular LoRA right now, MoXiN:
!wget 'https://civitai.com/api/download/models/14856' -O 'moxin.safetensors'
import safetensors
sd_lora = safetensors.torch.load_file('moxin.safetensors')
Some intro code adapted from #3294 below to play with the serialization and transformation. It is non-functional, as I don't fully understand the diffusers
serialization format.
from typing import Dict
LORA_PREFIX_TEXT_ENCODER = 'lora_te'
LORA_PREFIX_UNET = 'lora_unet'
def convert_auto1111_to_diffusers_state_dict(state_dict: Dict):
diffusers_state_dict = dict()
for key in state_dict:
# it is suggested to print out the key, it usually will be something like below
# "lora_te_text_model_encoder_layers_0_self_attn_k_proj.lora_down.weight"
# ignore alpha
if ".alpha" in key:
continue
if "text" in key:
layer_infos = key.split(".")[0].split(LORA_PREFIX_TEXT_ENCODER + "_")[-1].split("_")
curr_layer = pipe.text_encoder
else:
layer_infos = key.split(".")[0].split(LORA_PREFIX_UNET + "_")[-1].split("_")
curr_layer = pipe.unet
# find the target layer
transformed_name = []
temp_name = layer_infos.pop(0)
while len(layer_infos) > -1:
try:
curr_layer = curr_layer.__getattr__(temp_name)
if len(layer_infos) > 0:
transformed_name.append(temp_name)
temp_name = layer_infos.pop(0)
elif len(layer_infos) == 0:
transformed_name.append(temp_name)
break
except Exception:
if len(temp_name) > 0:
temp_name += "_" + layer_infos.pop(0)
else:
transformed_name.append(temp_name)
temp_name = layer_infos.pop(0)
# TODO: use the transformed name and key to
# create the modified key
transformed_key = '.'.join(transformed_name)
# Needs more transformation, but I don't understand
# the diffusers serialization well enough.
# Example print:
# up_blocks.3.attentions.1.transformer_blocks.0.attn1.to_k lora_down.weight
print(transformed_key, '.'.join(key.split('.')[1:]))
# ... once key is transformed, add to diffusers_state_dict
diffusers_state_dict[transformed_key] = state_dict[key]
return diffusers_state_dict
Another stickier consideration: the current monkey-patching of the text_encoder
for LoRA doesn't allow for easy removal of a LoRA to restore the previous model.
Happy to jump in more, but would need some documentation on the LoRA serialization format.
Looked into this, but stuck due to not being able to find documentation on the diffusers serialization format. auto1111 seems pretty straightforward thankfully, but unsure how to convert to diffusers.
Sorry for this. Maybe we should add a section here on the serialization format (cc: @patrickvonplaten).
But trying to unblock you here. Previously, we only used to support LoRA fine-tuning of the UNet. You can find such a LoRA param file here: https://huggingface.co/patrickvonplaten/lora_dreambooth_dog_example/blob/main/pytorch_lora_weights.bin
There, we used to initialize the LoRA layers and just save them as regular torch/safetensors state dicts:
(Note that the above code snippet is from an earlier commit)
But now, the serialization format has changed with the introduction of save_lora_weights()
. Now, to better distinguish between the UNet and the text params, we first initialize the LoRA layers like before. But during serialization, we follow:
I hope this makes sense :-)
Let me know if anything is unclear.
Describe the bug
I have downloaded lora from civitai which is in .safetensor format.
When i load it using below code,
pipe = StableDiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", torch_dtype=torch.float16) pipe.unet.load_attn_procs("lora.safetensors")
It throws error : KeyError: 'to_k_lora.down.weight'
File "/workspace/server/tasks.py", line 346, in txt2img self.pipe.unet.load_attn_procs(embd, use_safetensors=True) File "/opt/conda/envs/ldm/lib/python3.8/site-packages/diffusers/loaders.py", line 224, in load_attn_procs rank = value_dict["to_k_lora.down.weight"].shape[0] KeyError: 'to_k_lora.down.weight'
Reproduction
pipe = StableDiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", torch_dtype=torch.float16) pipe.unet.load_attn_procs("lora.safetensors")
Logs
No response
System Info
Diffusers Version: 0.15.0.dev0