Improved support for working with Kohya-style LoRAs in diffusers 🤗

sayakpaul commented 1 year ago

A seamless interoperability between the Kohya-styled LoRAs and Diffusers has been one of the most requested features from the community in the last months.

We are making promising progress in this regard.

With #4287, this support should be quite improved. We also have made a patch release to make it available. So, we ask the community to try this feature and let us know of any issues.

Get started by reading the documentation here. Also, be aware of the known limitations and know that we're actively working to mitigate them quickly.

A special heart-felt thanks to @takuma104 and @isidentical who significantly helped us in getting this far!

shubhdotai commented 1 year ago

Unable to load LoRA from a local folder (Downloaded from civitai, for SD1.5)

pipe.load_lora_weights("./loras", weight_name="Theovercomer8.safetensors")

Is anything wrong with this? @sayakpaul

pdoane commented 1 year ago

Great to see the improvements! Are there plans to support loading multiple LoRAs soon?

sayakpaul commented 1 year ago

@pdoane not immediately but happy to discuss design and related things.

sayakpaul commented 1 year ago

Unable to load LoRA from a local folder (Downloaded from civitai, for SD1.5)

pipe.load_lora_weights("./loras", weight_name="Theovercomer8.safetensors")

Is anything wrong with this? @sayakpaul

Not sure as this seems to work: https://colab.research.google.com/gist/sayakpaul/0b0de72df83a665e8b525c1f8c76f218/scratchpad.ipynb

shubhdotai commented 1 year ago

Unable to load LoRA from a local folder (Downloaded from civitai, for SD1.5) pipe.load_lora_weights("./loras", weight_name="Theovercomer8.safetensors") Is anything wrong with this? @sayakpaul

Not sure as this seems to work: https://colab.research.google.com/gist/sayakpaul/0b0de72df83a665e8b525c1f8c76f218/scratchpad.ipynb

For me, it works in local but deploying on the container gives an error (Diffusers version = 0.19.2)

File "/root/.pyenv/versions/3.9.17/lib/python3.9/site-packages/diffusers/loaders.py", line 1093, in lora_state_dict
for k in state_dict.keys()
UnboundLocalError: local variable 'state_dict' referenced before assignment

adhikjoshi commented 1 year ago

Lycoris support will give last mile connectivity in Lora sphere, any plans to support it?

sayakpaul commented 1 year ago

Lycoris support will give last mile connectivity in Lora sphere, any plans to support it?

LyCORIS's LoCon LoRAs should work. LoHA won't work yet (keys containing "hada").

sayakpaul commented 1 year ago

For me, it works in local but deploying on the container gives an error (Diffusers version = 0.19.2)

Can't debug this unfortunately when it's running in a container.

adhikjoshi commented 1 year ago

LyCORIS's LoCon LoRAs should work. LoHA won't work yet (keys containing "hada").

I tested on lycoris, it's not giving results it should using load_lora_weights

I made gist which use diffusers with custom loading which works well for lycoris, lora and hada (still need improvements)

https://gist.github.com/adhikjoshi/2c6da89cbcd7a6a3344d3081ccd1dda0

pdoane commented 1 year ago

@pdoane not immediately but happy to discuss design and related things.

If scale were extended to take a list, it isn't clear which LoRA it should be applied to. We could say it's the order that LoRAs were loaded, but this gets more complicated when the set of loaded LoRAs changes overtime.

It's tempting to try to resolve this in the generate call, but that's too late for computing prompt_embeds. An explicit configuration step on the pipeline makes the most sense to me. This would transform the pipeline from one set of active LoRAs to another, possible being smart about reuse.

So one option is:

    lora_a = pipe.lora_state_dict("lora_a.safetensors");
    lora_b = pipe.lora_state_dict("lora_b.safetensors");
    lora_c = pipe.lora_state_dict("lora_c.safetensors");

    pipe.set_loras([lora_a, lora_b])
    prompt_embeds = ...
    pipe(...., cross_attention_kwargs={"scale": [0.3, 0.6]})  # A at 0.3, B at 0.6

    pipe.set_loras([lora_b, lora_c])
    prompt_embeds = ...
    pipe(...., cross_attention_kwargs={"scale": [0.4, 0.5]})  # B at 0.4, C at 0.5

But this design would not allow for diffusers to mutate the weights which was important for performance in earlier analysis. Another option would be to specify weights at the same time:

    pipe.set_loras([(lora_a, 0.3), (lora_b, 0.6)])
    prompt_embeds = ...
    pipe(....)  # A at 0.3, B at 0.6

    pipe.set_loras([(lora_b, 0.4), (lora_c, 0.5)])
    prompt_embeds = ...
    pipe(....)  # B at 0.4, C at 0.5

I prefer the second approach as it gives more implementation freedom to diffusers and makes the binding between scale and the LoRA more explicit.

pdoane commented 1 year ago

Actually the 2nd approach is needed for correctness too - see my comment in damian0815/compel#42. The order of operation needs to be:

Set current LoRAs and their weights on the pipeline
Create prompt embeds
Generate image

The scale parameter on cross_attention_kwargs should probably just be deprecated.

genesiscz commented 1 year ago

I am so sorry for kinda hijacking, but I don't want to create a separate issue as I am sure it's something pretty straightforward. If I train SDXL LoRa using train_dreambooth_lora_sdxl.py and it outputs a bin file, how are you supposed to transform it to a .safetensors format so I can load it just like pipe.load_lora_weights("./loras", weight_name="Theovercomer8.safetensors") ?

Also, is such LoRa from dreambooth supposed to work in ComfyUI?

Also, what "-style" LoRAs does the dreambooth training create?

sayakpaul commented 1 year ago

All of those call for separate discussions and should be asked on your Discord forum since they don't concern the design of the library or any issues related to it.

CoffeeVampir3 commented 1 year ago

@pdoane not immediately but happy to discuss design and related things.

If scale were extended to take a list, it isn't clear which LoRA it should be applied to. We could say it's the order that LoRAs were loaded, but this gets more complicated when the set of loaded LoRAs changes overtime.

It's tempting to try to resolve this in the generate call, but that's too late for computing prompt_embeds. An explicit configuration step on the pipeline makes the most sense to me. This would transform the pipeline from one set of active LoRAs to another, possible being smart about reuse.

So one option is:
    lora_a = pipe.lora_state_dict("lora_a.safetensors");
    lora_b = pipe.lora_state_dict("lora_b.safetensors");
    lora_c = pipe.lora_state_dict("lora_c.safetensors");

    pipe.set_loras([lora_a, lora_b])
    prompt_embeds = ...
    pipe(...., cross_attention_kwargs={"scale": [0.3, 0.6]})  # A at 0.3, B at 0.6

    pipe.set_loras([lora_b, lora_c])
    prompt_embeds = ...
    pipe(...., cross_attention_kwargs={"scale": [0.4, 0.5]})  # B at 0.4, C at 0.5
But this design would not allow for diffusers to mutate the weights which was important for performance in earlier analysis. Another option would be to specify weights at the same time:
    pipe.set_loras([(lora_a, 0.3), (lora_b, 0.6)])
    prompt_embeds = ...
    pipe(....)  # A at 0.3, B at 0.6

    pipe.set_loras([(lora_b, 0.4), (lora_c, 0.5)])
    prompt_embeds = ...
    pipe(....)  # B at 0.4, C at 0.5
I prefer the second approach as it gives more implementation freedom to diffusers and makes the binding between scale and the LoRA more explicit.

I like the set_loras design, here's my 2C

Maybe something like

config = LoraConfig(xyz)
lora = Lora(xyz)
loras = {"Name":(lora, config)}
pipe.add_supporting_networks(loras)

My thoughts here are that thing gives you stronger modularity, and gives the option to do something like

pipe.remove_support_network("Name")

In webui's like automatic, there's a tendency to load lots of loras at different strengths (potentially also loras of different design), this design would support something like that and give some future proofing by moving the config to it's own object. The modularity means this design extends to arbitrary future designs, as LORAS may in the future add more design space for parameters, we're already sort of seeing this is the case for adapters.

Internally, you could farm out the processing to something like:

config.run_processor(Lora)

By welding the processing code to the config object, you can easily enable multiple lora/adapter types by composition.

pdoane commented 1 year ago

Why pipe.loadlorawweight("path",weightval)?

This requires re-reading the file every time the weight changes and in the case of multiple LoRAs requires re-reading every single one. SD-1 LoRAs are up to ~150MB and SDXL LoRAs around ~400MB (I've even seen one at almost 2GB). This is enough I/O traffic that separating loading is useful.

We also should be careful with incremental APIs (e.g. add/remove) as it requires the network to be in a consistent state after each operation. There could be begin_update/end_update methods but it's simpler to just set everything all at once.

if we do parsing on our own there is a high chance we will implement it returning dict /2 array

If you are looking for A1111 compatibility, you will have to parse the prompt multiple times:

Parse prompt for LoRA parameters
Set current LoRAs and their weights on the pipeline
Re-parse prompt to generate prompt embeds
Generate image

Creating the prompt embeddings must be done after the LoRAs are loaded with their weights configured as they impact the text encoder. A1111 approach is a UI choice though and many other tools specify LoRA outside of the prompt.

pdoane commented 1 year ago

Diffusers should also consider using multiple scales for each LoRA. Comfy allows for the Text Encoder scale and the UNet scale to be set separately.

Gynjn commented 1 year ago

first

lora_a = pipe.lora_state_dict("lora_a.safetensors");
lora_b = pipe.lora_state_dict("lora_b.safetensors");
lora_c = pipe.lora_state_dict("lora_c.safetensors");

pipe.set_loras([lora_a, lora_b])
prompt_embeds = ...
pipe(...., cross_attention_kwargs={"scale": [0.3, 0.6]})  # A at 0.3, B at 0.6

pipe.set_loras([lora_b, lora_c])
prompt_embeds = ...
pipe(...., cross_attention_kwargs={"scale": [0.4, 0.5]})  # B at 0.4, C at 0.5

second

pipe.set_loras([(lora_a, 0.3), (lora_b, 0.6)])
prompt_embeds = ...
pipe(....)  # A at 0.3, B at 0.6

pipe.set_loras([(lora_b, 0.4), (lora_c, 0.5)])
prompt_embeds = ...
pipe(....)  # B at 0.4, C at 0.5

First of all, thanks for your work I have some question about above code that you wrote. With first method, only the cross_attn weight of two loras are setted with the numbers setted e.g. (0.4, 0.5) And with second method, the text_emb weight and cross_attn weight are setted with the numbers right..? @pdoane

pdoane commented 1 year ago

The two are meant to be equivalent - the first design was looking at minimal changes to the existing API and the second design is roughly the way I think this should be done.

However the first approach doesn't work with custom embeds (which is true of where we are today as well). I hadn't realized the order of operations problems until seeing a bug report in the Compel repo.

In all cases we want to set the weights for text encoder and unet, and my most recent comment was suggesting that we make those independent as well (matching Comfy support).

Gynjn commented 1 year ago

Thanks for reply

Gynjn commented 1 year ago

By the way, is the set_loras attribute is your custom?

I got an error message with AttributeError: 'StableDiffusionControlNetImg2ImgPipeline' object has no attribute 'set_loras'

@pdoane

pdoane commented 1 year ago

We're discussing issues with the current implementation and future API design so none of this exists yet. My comments have been focused on extending to multiple LoRAs and generating custom prompt embeds correctly.

pdoane commented 1 year ago

I see...you are right about this one,but why would we need to change the weight often?

LoRA strength can change very often. The minimum granularity to support would be every invocation, but I would like to change UNet strength at every sampler step (similar to what is allowed in ControlNet).

not gonna lie multistep is cool,but I am just not certain if its really needed?

Yes, even if only for the I/O and memory overhead. These objects are large enough that caching is important and diffusers should not implement that logic.

hoveychen commented 1 year ago

Not really clear about the tech under the hood. But in our cases, it would be nice to support:

effectively load/unload couples of LoRAs in each run. We prepares individual LoRAs for each style and/or each item, it would not be possible to load all the LoRAs into memory.
support load/unload LoRA in specific slot. In our case, it would be IO costly to switch all the LoRAs in each run, since we run like a manner: for a in a_model_list: for b in b_model_list: run(a, b)
Either fixed or variable to control the scale for each LoRA are acceptable. We always to put item model in the first slot, and style model in the second slot.

JemiloII commented 1 year ago

I'm going to sound like a broken record at this point, but I am still disappointed that the 19.x release breaks how I was loading and unloading multiple loras.

My flow is keeping a pipeline in memory. I don't change models, I just keep the same model. Then I load loras I want to use and unload them afterward. This way the pipeline is back to a pristine state. Needing to load multiple loras is a must. The scale needs to be there as well. load_lora(file="lora.safetensor", scale=1)

this is just dumb

pipe.load_lora_weights("./loras", weight_name="Theovercomer8.safetensors") "./loras", <= dumb, use cache_dir or combine with the file name. weight_name, come on, this is a file name, dumb dumb dumb name.

Like how these loras are currently loaded diffuser style; just doesn't feel consistent with the rest of the library. It's just dumb.

Ideally, this would be nice if we wanted to keep things in line with how we name things in the diffuser library:

loaded_lora_1 = pipe.load_lora(
    pretrained_model_name_or_path=f"/path/to/lora/file_name_1.safetensors",
    scale=1,
)

loaded_lora_2 = pipe.load_lora(
    pretrained_model_name_or_path=f"/path/to/lora/file_name_2.safetensors",
    scale=1,
)

pipe.remove_lora(loaded_lora_1)
pipe.remove_lora(loaded_lora_2)

It's very similar to the kohya lora loader for diffusers:

loaded_lora_1 = pipe.apply_lora(
    filename=f"/path/to/lora/file_name_1.safetensors",
    alpha=1,
)

loaded_lora_1.alpha = 0.5

loaded_lora_2 = pipe.apply_lora(
    filename=f"/path/to/lora/file_name_2.safetensors",
    alpha=1,
)

pipe.remove_lora(loaded_lora_1)
pipe.remove_lora(loaded_lora_2)

I really don't like the inconsistencies in the diffusers library.

pdoane commented 1 year ago

pipe.load_lora_weights("./loras", weight_name="Theovercomer8.safetensors") "./loras", <= dumb, use cache_dir or combine with the file name. weight_name, come on, this is a file name, dumb dumb dumb name.

weight_name is optional, you can just do pipe.load_lora_weights("./loras/Theovercomer8.safetensors")

loaded_lora_1 = pipe.load_lora( pretrained_model_name_or_path=f"/path/to/lora/file_name_1.safetensors", scale=1, )

A disadvantage to this approach (and the other example) is that keeping a LoRA cached in memory requires setting the scale to 0. The diffusers implementation does not modify weights so there would be a performance hit for every LoRA loaded.

JemiloII commented 1 year ago

@pdoane The way I have been doing it loads and unloads the LoRAs just fine without having to keep the LoRA in memory. The current way the LoRAs are handled with the library is not good. I'd rather have my set of LoRAs I want to be loaded than not.

A disadvantage to this approach (and the other example) is that keeping a LoRA cached in memory requires setting the scale to 0. The diffusers implementation does not modify weights so there would be a performance hit for every LoRA loaded.

I like performance, but this library is inconsistent on the performance end. Here is a bit from their website: https://huggingface.co/docs/diffusers/index

Our library is designed with a focus on usability over performance, simple over easy, and customizability over abstractions.

Do I want performance? Yes, but if their philosophy is to have a simple and highly customizable library, only having the ability to load a single LoRA does not fit the bill.

pdoane commented 1 year ago

The way I have been doing it loads and unloads the LoRAs just fine without having to keep the LoRA in memory

Currently diffusers keeps the weights in memory and does not bake them into the pipeline. It's not clear that the Diffusers team will change that approach, but an API that supports both seems better if it doesn't introduce significant tradeoffs.

I would be surprised for any of the API variations being proposed to create a challenge for an application to adopt. They have minor differences (e.g. do I have an object, and where do I specify a weight). All of them are in my opinion simple to use. Where they differ is in their customizability, particularly with tradeoffs between I/O and memory that should be application decisions.

aycaecemgul commented 1 year ago

Unable to load LoRA from a local folder (Downloaded from civitai, for SD1.5) pipe.load_lora_weights("./loras", weight_name="Theovercomer8.safetensors") Is anything wrong with this? @sayakpaul

Not sure as this seems to work: https://colab.research.google.com/gist/sayakpaul/0b0de72df83a665e8b525c1f8c76f218/scratchpad.ipynb

For me, it works in local but deploying on the container gives an error (Diffusers version = 0.19.2)
File "/root/.pyenv/versions/3.9.17/lib/python3.9/site-packages/diffusers/loaders.py", line 1093, in lora_state_dict
for k in state_dict.keys()
UnboundLocalError: local variable 'state_dict' referenced before assignment

I am getting the same error. Have you resolved this?

tolgakurtuluss commented 1 year ago

Unable to load LoRA from a local folder (Downloaded from civitai, for SD1.5) pipe.load_lora_weights("./loras", weight_name="Theovercomer8.safetensors") Is anything wrong with this? @sayakpaul

Not sure as this seems to work: https://colab.research.google.com/gist/sayakpaul/0b0de72df83a665e8b525c1f8c76f218/scratchpad.ipynb

For me, it works in local but deploying on the container gives an error (Diffusers version = 0.19.2)
File "/root/.pyenv/versions/3.9.17/lib/python3.9/site-packages/diffusers/loaders.py", line 1093, in lora_state_dict
for k in state_dict.keys()
UnboundLocalError: local variable 'state_dict' referenced before assignment

I'm facing with the same error while trying to run it on Colab. Any update?

wilfrediscoming commented 1 year ago

will be SUPER great if diffuser can support loading >1 LORAs!!!!! excited!!!

patrickvonplaten commented 1 year ago

@cm5kZGV2MjAyM3B1YmdpdGh1YmFjYw please don't add messages such as "reping", this is really not helping. Every issue that is not closed is on our mind.

There is an open PR for better Koyha-style support: https://github.com/huggingface/diffusers/pull/5102 , it would be nice to search for open PRs instead of "re-pinging" people. Note that we're getting 100s of pings every day and need to be able to work efficiently to handle the workload here.

I would be extremely thankful if in the future, we could:

1.) First check if this issue has been solved by trying to load a Koyha-style LoRA
2.) Second check open PRs that might solve it and try them out
3.) If none of the PRs solves your issue, open a new issue and link it to this thread

This would help us much much more than a "reping" message. Thanks!

JemiloII commented 1 year ago

I used to update a lot but when version 19 came and broke me being able to use multiple LoRAs, I sorta dropped off from following things and just been in my ecosystem. I'd love to one day use newer features, but breaking changes and having them kinda ignored was disappointing since I pointed out which changes/pr that was made months back.

sayakpaul commented 1 year ago

I think you were using an externally patched version for sure we surely cannot provide any guarantee, sorry. But I am confident about the current API we're building to support multiple LoRA inference, especially with the integration of peft.

sayakpaul commented 1 year ago

Thanks for all the discussions here, everyone!

We have introduced the support for multi-adapter inference with the help of peft. Follow more here: https://huggingface.co/docs/diffusers/main/en/tutorials/using_peft_for_inference.

So, that said, I am gonna close this issue :)

huggingface / diffusers

Improved support for working with Kohya-style LoRAs in diffusers 🤗 #4348