huggingface / diffusers

🤗 Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch and FLAX.
https://huggingface.co/docs/diffusers
Apache License 2.0
25.27k stars 5.23k forks source link

`lora_scale` has no effect when loading with Flux #9525

Open cshowley opened 2 hours ago

cshowley commented 2 hours ago

Describe the bug

According to loading loras for inference an argument cross_attention_kwargs={"scale": 0.5} can be added to a pipeline() call to vary the impact of a LORA on image generation. As the FluxPipeline class doesn't support this argument I followed the guide here to embed the text prompt with a LORA scaling parameter. However the image remained unchanged with a fixed seed+prompt and a variable lora_scale. I checked the embedding values for different values of lora_scale and saw they did not change either. Does Flux in diffusers not support LORA scaling or am I missing something?

Reproduction

from diffusers import FluxPipeline import torch from PIL import Image

model_path="black-forest-labs/FLUX.1-dev" lora_path="CiroN2022/toy-face" weight_name="toy_face_sdxl.safetensors" device = 'cuda' seed = torch.manual_seed(0)

pipeline = FluxPipeline.from_pretrained( model_path=model_path, torch_dtype=torch.bfloat16, use_safetensors=True, ).to(device)

pipeline.load_lora_weights( lora_path, weight_name=weight_name )

prompt = "toy_face of a hacker with a hoodie" lora_scale = 0.5 prompt_embeds, pooled_promptembeds, = pipeline.encode_prompt( prompt=prompt, prompt_2=None, lora_scale=lora_scale, )

image = pipeline( prompt_embeds=prompt_embeds, pooled_prompt_embeds=pooled_prompt_embeds, num_inference_steps=10, guidance_scale=5, generator=seed, ).images[0]

image.show()

Logs

No response

System Info

Who can help?

No response

asomoza commented 2 hours ago

Hi, I never use that method, can you test with this?

pipeline.load_lora_weights(lora_path, weight_name=weight_name, adapter_name="toy")
pipe.set_adapters("toy", 0.5)

And yeah, the Flux pipeline doesn't have cross_attention_kwargs and you're using it directly when encoding the prompt, if the lora didn't train the text encoders (most don't), you won't see any difference.

cshowley commented 1 hour ago

Your suggestion pipe.set_adapters("toy", 0.5) is not showing any change unfortunately.

In this guide I see the following code block:

pipe = ... # create pipeline
pipe.load_lora_weights(..., adapter_name="my_adapter")
scales = {
    "text_encoder": 0.5,
    "text_encoder_2": 0.5,  # only usable if pipe has a 2nd text encoder
    "unet": {
        "down": 0.9,  # all transformers in the down-part will use scale 0.9
        # "mid"  # in this example "mid" is not given, therefore all transformers in the mid part will use the default scale 1.0
        "up": {
            "block_0": 0.6,  # all 3 transformers in the 0th block in the up-part will use scale 0.6
            "block_1": [0.4, 0.8, 1.0],  # the 3 transformers in the 1st block in the up-part will use scales 0.4, 0.8 and 1.0 respectively
        }
    }
}
pipe.set_adapters("my_adapter", scales)

which says to pass a dictionary in the the .set_adapters() call. If I pass 0.5 like you said does that apply that weighting to all elements of the LORA?