Closed sayakpaul closed 1 year ago
Unable to load LoRA from a local folder (Downloaded from civitai, for SD1.5)
pipe.load_lora_weights("./loras", weight_name="Theovercomer8.safetensors")
Is anything wrong with this? @sayakpaul
Great to see the improvements! Are there plans to support loading multiple LoRAs soon?
@pdoane not immediately but happy to discuss design and related things.
Unable to load LoRA from a local folder (Downloaded from civitai, for SD1.5)
pipe.load_lora_weights("./loras", weight_name="Theovercomer8.safetensors")
Is anything wrong with this? @sayakpaul
Not sure as this seems to work: https://colab.research.google.com/gist/sayakpaul/0b0de72df83a665e8b525c1f8c76f218/scratchpad.ipynb
Unable to load LoRA from a local folder (Downloaded from civitai, for SD1.5)
pipe.load_lora_weights("./loras", weight_name="Theovercomer8.safetensors")
Is anything wrong with this? @sayakpaulNot sure as this seems to work: https://colab.research.google.com/gist/sayakpaul/0b0de72df83a665e8b525c1f8c76f218/scratchpad.ipynb
For me, it works in local but deploying on the container gives an error (Diffusers version = 0.19.2)
File "/root/.pyenv/versions/3.9.17/lib/python3.9/site-packages/diffusers/loaders.py", line 1093, in lora_state_dict
for k in state_dict.keys()
UnboundLocalError: local variable 'state_dict' referenced before assignment
Lycoris support will give last mile connectivity in Lora sphere, any plans to support it?
Lycoris support will give last mile connectivity in Lora sphere, any plans to support it?
LyCORIS's LoCon LoRAs should work. LoHA won't work yet (keys containing "hada").
For me, it works in local but deploying on the container gives an error (Diffusers version = 0.19.2)
Can't debug this unfortunately when it's running in a container.
LyCORIS's LoCon LoRAs should work. LoHA won't work yet (keys containing "hada").
I tested on lycoris, it's not giving results it should using load_lora_weights
I made gist which use diffusers with custom loading which works well for lycoris, lora and hada (still need improvements)
https://gist.github.com/adhikjoshi/2c6da89cbcd7a6a3344d3081ccd1dda0
@pdoane not immediately but happy to discuss design and related things.
If scale
were extended to take a list, it isn't clear which LoRA it should be applied to. We could say it's the order that LoRAs were loaded, but this gets more complicated when the set of loaded LoRAs changes overtime.
It's tempting to try to resolve this in the generate call, but that's too late for computing prompt_embeds
. An explicit configuration step on the pipeline makes the most sense to me. This would transform the pipeline from one set of active LoRAs to another, possible being smart about reuse.
So one option is:
lora_a = pipe.lora_state_dict("lora_a.safetensors");
lora_b = pipe.lora_state_dict("lora_b.safetensors");
lora_c = pipe.lora_state_dict("lora_c.safetensors");
pipe.set_loras([lora_a, lora_b])
prompt_embeds = ...
pipe(...., cross_attention_kwargs={"scale": [0.3, 0.6]}) # A at 0.3, B at 0.6
pipe.set_loras([lora_b, lora_c])
prompt_embeds = ...
pipe(...., cross_attention_kwargs={"scale": [0.4, 0.5]}) # B at 0.4, C at 0.5
But this design would not allow for diffusers to mutate the weights which was important for performance in earlier analysis. Another option would be to specify weights at the same time:
pipe.set_loras([(lora_a, 0.3), (lora_b, 0.6)])
prompt_embeds = ...
pipe(....) # A at 0.3, B at 0.6
pipe.set_loras([(lora_b, 0.4), (lora_c, 0.5)])
prompt_embeds = ...
pipe(....) # B at 0.4, C at 0.5
I prefer the second approach as it gives more implementation freedom to diffusers and makes the binding between scale and the LoRA more explicit.
Actually the 2nd approach is needed for correctness too - see my comment in damian0815/compel#42. The order of operation needs to be:
The scale
parameter on cross_attention_kwargs
should probably just be deprecated.
I am so sorry for kinda hijacking, but I don't want to create a separate issue as I am sure it's something pretty straightforward. If I train SDXL LoRa using
train_dreambooth_lora_sdxl.py and it outputs a bin file, how are you supposed to transform it to a .safetensors format so I can load it just like pipe.load_lora_weights("./loras", weight_name="Theovercomer8.safetensors")
?
Also, is such LoRa from dreambooth supposed to work in ComfyUI?
Also, what "-style" LoRAs does the dreambooth training create?
All of those call for separate discussions and should be asked on your Discord forum since they don't concern the design of the library or any issues related to it.
@pdoane not immediately but happy to discuss design and related things.
If
scale
were extended to take a list, it isn't clear which LoRA it should be applied to. We could say it's the order that LoRAs were loaded, but this gets more complicated when the set of loaded LoRAs changes overtime.It's tempting to try to resolve this in the generate call, but that's too late for computing
prompt_embeds
. An explicit configuration step on the pipeline makes the most sense to me. This would transform the pipeline from one set of active LoRAs to another, possible being smart about reuse.So one option is:
lora_a = pipe.lora_state_dict("lora_a.safetensors"); lora_b = pipe.lora_state_dict("lora_b.safetensors"); lora_c = pipe.lora_state_dict("lora_c.safetensors"); pipe.set_loras([lora_a, lora_b]) prompt_embeds = ... pipe(...., cross_attention_kwargs={"scale": [0.3, 0.6]}) # A at 0.3, B at 0.6 pipe.set_loras([lora_b, lora_c]) prompt_embeds = ... pipe(...., cross_attention_kwargs={"scale": [0.4, 0.5]}) # B at 0.4, C at 0.5
But this design would not allow for diffusers to mutate the weights which was important for performance in earlier analysis. Another option would be to specify weights at the same time:
pipe.set_loras([(lora_a, 0.3), (lora_b, 0.6)]) prompt_embeds = ... pipe(....) # A at 0.3, B at 0.6 pipe.set_loras([(lora_b, 0.4), (lora_c, 0.5)]) prompt_embeds = ... pipe(....) # B at 0.4, C at 0.5
I prefer the second approach as it gives more implementation freedom to diffusers and makes the binding between scale and the LoRA more explicit.
I like the set_loras
design, here's my 2C
Maybe something like
config = LoraConfig(xyz)
lora = Lora(xyz)
loras = {"Name":(lora, config)}
pipe.add_supporting_networks(loras)
My thoughts here are that thing gives you stronger modularity, and gives the option to do something like
pipe.remove_support_network("Name")
In webui's like automatic, there's a tendency to load lots of loras at different strengths (potentially also loras of different design), this design would support something like that and give some future proofing by moving the config to it's own object. The modularity means this design extends to arbitrary future designs, as LORAS may in the future add more design space for parameters, we're already sort of seeing this is the case for adapters.
Internally, you could farm out the processing to something like:
config.run_processor(Lora)
By welding the processing code to the config object, you can easily enable multiple lora/adapter types by composition.
Why pipe.loadlorawweight("path",weightval)?
This requires re-reading the file every time the weight changes and in the case of multiple LoRAs requires re-reading every single one. SD-1 LoRAs are up to ~150MB and SDXL LoRAs around ~400MB (I've even seen one at almost 2GB). This is enough I/O traffic that separating loading is useful.
We also should be careful with incremental APIs (e.g. add/remove) as it requires the network to be in a consistent state after each operation. There could be begin_update/end_update methods but it's simpler to just set everything all at once.
if we do parsing on our own there is a high chance we will implement it returning dict /2 array
If you are looking for A1111 compatibility, you will have to parse the prompt multiple times:
Creating the prompt embeddings must be done after the LoRAs are loaded with their weights configured as they impact the text encoder. A1111 approach is a UI choice though and many other tools specify LoRA outside of the prompt.
Diffusers should also consider using multiple scales for each LoRA. Comfy allows for the Text Encoder scale and the UNet scale to be set separately.
first
lora_a = pipe.lora_state_dict("lora_a.safetensors");
lora_b = pipe.lora_state_dict("lora_b.safetensors");
lora_c = pipe.lora_state_dict("lora_c.safetensors");
pipe.set_loras([lora_a, lora_b])
prompt_embeds = ...
pipe(...., cross_attention_kwargs={"scale": [0.3, 0.6]}) # A at 0.3, B at 0.6
pipe.set_loras([lora_b, lora_c])
prompt_embeds = ...
pipe(...., cross_attention_kwargs={"scale": [0.4, 0.5]}) # B at 0.4, C at 0.5
second
pipe.set_loras([(lora_a, 0.3), (lora_b, 0.6)])
prompt_embeds = ...
pipe(....) # A at 0.3, B at 0.6
pipe.set_loras([(lora_b, 0.4), (lora_c, 0.5)])
prompt_embeds = ...
pipe(....) # B at 0.4, C at 0.5
First of all, thanks for your work I have some question about above code that you wrote. With first method, only the cross_attn weight of two loras are setted with the numbers setted e.g. (0.4, 0.5) And with second method, the text_emb weight and cross_attn weight are setted with the numbers right..? @pdoane
The two are meant to be equivalent - the first design was looking at minimal changes to the existing API and the second design is roughly the way I think this should be done.
However the first approach doesn't work with custom embeds (which is true of where we are today as well). I hadn't realized the order of operations problems until seeing a bug report in the Compel repo.
In all cases we want to set the weights for text encoder and unet, and my most recent comment was suggesting that we make those independent as well (matching Comfy support).
Thanks for reply
By the way, is the set_loras
attribute is your custom?
I got an error message with AttributeError: 'StableDiffusionControlNetImg2ImgPipeline' object has no attribute 'set_loras'
@pdoane
We're discussing issues with the current implementation and future API design so none of this exists yet. My comments have been focused on extending to multiple LoRAs and generating custom prompt embeds correctly.
I see...you are right about this one,but why would we need to change the weight often?
LoRA strength can change very often. The minimum granularity to support would be every invocation, but I would like to change UNet strength at every sampler step (similar to what is allowed in ControlNet).
not gonna lie multistep is cool,but I am just not certain if its really needed?
Yes, even if only for the I/O and memory overhead. These objects are large enough that caching is important and diffusers should not implement that logic.
Not really clear about the tech under the hood. But in our cases, it would be nice to support:
for a in a_model_list: for b in b_model_list: run(a, b)
I'm going to sound like a broken record at this point, but I am still disappointed that the 19.x release breaks how I was loading and unloading multiple loras.
My flow is keeping a pipeline in memory. I don't change models, I just keep the same model. Then I load loras I want to use and unload them afterward. This way the pipeline is back to a pristine state. Needing to load multiple loras is a must. The scale needs to be there as well.
load_lora(file="lora.safetensor", scale=1)
this is just dumb
pipe.load_lora_weights("./loras", weight_name="Theovercomer8.safetensors")
"./loras", <= dumb, use cache_dir or combine with the file name.
weight_name, come on, this is a file name, dumb dumb dumb name.
Like how these loras are currently loaded diffuser style; just doesn't feel consistent with the rest of the library. It's just dumb.
Ideally, this would be nice if we wanted to keep things in line with how we name things in the diffuser library:
loaded_lora_1 = pipe.load_lora(
pretrained_model_name_or_path=f"/path/to/lora/file_name_1.safetensors",
scale=1,
)
loaded_lora_2 = pipe.load_lora(
pretrained_model_name_or_path=f"/path/to/lora/file_name_2.safetensors",
scale=1,
)
pipe.remove_lora(loaded_lora_1)
pipe.remove_lora(loaded_lora_2)
It's very similar to the kohya lora loader for diffusers:
loaded_lora_1 = pipe.apply_lora(
filename=f"/path/to/lora/file_name_1.safetensors",
alpha=1,
)
loaded_lora_1.alpha = 0.5
loaded_lora_2 = pipe.apply_lora(
filename=f"/path/to/lora/file_name_2.safetensors",
alpha=1,
)
pipe.remove_lora(loaded_lora_1)
pipe.remove_lora(loaded_lora_2)
I really don't like the inconsistencies in the diffusers library.
pipe.load_lora_weights("./loras", weight_name="Theovercomer8.safetensors") "./loras", <= dumb, use cache_dir or combine with the file name. weight_name, come on, this is a file name, dumb dumb dumb name.
weight_name is optional, you can just do pipe.load_lora_weights("./loras/Theovercomer8.safetensors")
loaded_lora_1 = pipe.load_lora( pretrained_model_name_or_path=f"/path/to/lora/file_name_1.safetensors", scale=1, )
A disadvantage to this approach (and the other example) is that keeping a LoRA cached in memory requires setting the scale to 0. The diffusers implementation does not modify weights so there would be a performance hit for every LoRA loaded.
@pdoane The way I have been doing it loads and unloads the LoRAs just fine without having to keep the LoRA in memory. The current way the LoRAs are handled with the library is not good. I'd rather have my set of LoRAs I want to be loaded than not.
A disadvantage to this approach (and the other example) is that keeping a LoRA cached in memory requires setting the scale to 0. The diffusers implementation does not modify weights so there would be a performance hit for every LoRA loaded.
I like performance, but this library is inconsistent on the performance end. Here is a bit from their website: https://huggingface.co/docs/diffusers/index
Our library is designed with a focus on usability over performance, simple over easy, and customizability over abstractions.
Do I want performance? Yes, but if their philosophy is to have a simple and highly customizable library, only having the ability to load a single LoRA does not fit the bill.
The way I have been doing it loads and unloads the LoRAs just fine without having to keep the LoRA in memory
Currently diffusers keeps the weights in memory and does not bake them into the pipeline. It's not clear that the Diffusers team will change that approach, but an API that supports both seems better if it doesn't introduce significant tradeoffs.
I would be surprised for any of the API variations being proposed to create a challenge for an application to adopt. They have minor differences (e.g. do I have an object, and where do I specify a weight). All of them are in my opinion simple to use. Where they differ is in their customizability, particularly with tradeoffs between I/O and memory that should be application decisions.
Unable to load LoRA from a local folder (Downloaded from civitai, for SD1.5)
pipe.load_lora_weights("./loras", weight_name="Theovercomer8.safetensors")
Is anything wrong with this? @sayakpaulNot sure as this seems to work: https://colab.research.google.com/gist/sayakpaul/0b0de72df83a665e8b525c1f8c76f218/scratchpad.ipynb
For me, it works in local but deploying on the container gives an error (Diffusers version = 0.19.2)
File "/root/.pyenv/versions/3.9.17/lib/python3.9/site-packages/diffusers/loaders.py", line 1093, in lora_state_dict for k in state_dict.keys() UnboundLocalError: local variable 'state_dict' referenced before assignment
I am getting the same error. Have you resolved this?
Unable to load LoRA from a local folder (Downloaded from civitai, for SD1.5)
pipe.load_lora_weights("./loras", weight_name="Theovercomer8.safetensors")
Is anything wrong with this? @sayakpaulNot sure as this seems to work: https://colab.research.google.com/gist/sayakpaul/0b0de72df83a665e8b525c1f8c76f218/scratchpad.ipynb
For me, it works in local but deploying on the container gives an error (Diffusers version = 0.19.2)
File "/root/.pyenv/versions/3.9.17/lib/python3.9/site-packages/diffusers/loaders.py", line 1093, in lora_state_dict for k in state_dict.keys() UnboundLocalError: local variable 'state_dict' referenced before assignment
I'm facing with the same error while trying to run it on Colab. Any update?
will be SUPER great if diffuser can support loading >1 LORAs!!!!! excited!!!
@cm5kZGV2MjAyM3B1YmdpdGh1YmFjYw please don't add messages such as "reping", this is really not helping. Every issue that is not closed is on our mind.
There is an open PR for better Koyha-style support: https://github.com/huggingface/diffusers/pull/5102 , it would be nice to search for open PRs instead of "re-pinging" people. Note that we're getting 100s of pings every day and need to be able to work efficiently to handle the workload here.
I would be extremely thankful if in the future, we could:
This would help us much much more than a "reping" message. Thanks!
I used to update a lot but when version 19 came and broke me being able to use multiple LoRAs, I sorta dropped off from following things and just been in my ecosystem. I'd love to one day use newer features, but breaking changes and having them kinda ignored was disappointing since I pointed out which changes/pr that was made months back.
I think you were using an externally patched version for sure we surely cannot provide any guarantee, sorry. But I am confident about the current API we're building to support multiple LoRA inference, especially with the integration of peft
.
Thanks for all the discussions here, everyone!
We have introduced the support for multi-adapter inference with the help of peft
. Follow more here: https://huggingface.co/docs/diffusers/main/en/tutorials/using_peft_for_inference.
So, that said, I am gonna close this issue :)
A seamless interoperability between the Kohya-styled LoRAs and Diffusers has been one of the most requested features from the community in the last months.
We are making promising progress in this regard.
With #4287, this support should be quite improved. We also have made a patch release to make it available. So, we ask the community to try this feature and let us know of any issues.
Get started by reading the documentation here. Also, be aware of the known limitations and know that we're actively working to mitigate them quickly.
A special heart-felt thanks to @takuma104 and @isidentical who significantly helped us in getting this far!