huggingface / diffusers

🤗 Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch and FLAX.
https://huggingface.co/docs/diffusers
Apache License 2.0
25.64k stars 5.3k forks source link

How to map A1111 reference_only parameters into diffusers? #6384

Closed Logos23333 closed 8 months ago

Logos23333 commented 9 months ago

Thanks for the community to implement the reference_only functionality in A1111, but how can the parameters correspond to each other? I have tried to reproduce the effect of webui in the diffusers library, but I can't seem to do it. I'm using the StableDiffusionReferencePipeline community pipeline.

My questions are:

  1. Is reference_only in A1111 equivalent to reference_attn=True, reference_adain=False? image
  2. Some parameters in A1111, such as starting control step, seem to have no corresponding parameters in the pipeline. image
  3. The style_fidelity in A111 seems to have significant differences compared to style_fidelity in A1111.
Logos23333 commented 9 months ago

I'll give some cases to see the differences between diffusers and a1111 with the same style_fidelity. The input image in a1111 examples: image Here is the diffusers result. test1 code:

local_sd_model_path = 'stable-diffusion-webui/models/Stable-diffusion/realisticVisionV20_v20.safetensors'
pipe = StableDiffusionReferencePipeline.from_single_file(
    local_sd_model_path,
    torch_dtype = torch.float16
)
pipe.scheduler = EulerDiscreteScheduler.from_config(pipe.scheduler.config)
pipe.enable_xformers_memory_efficient_attention()
pipe = pipe.to("cuda")
prompt = "woman in street, masterpiece, best quality,"
negative_prompt = "lowres, bad anatomy, bad hands, text, error, missing fingers, extra digit, fewer digits, cropped, worst quality, low quality, normal quality, jpeg artifacts, signature, watermark, username, blurry"
init_image = Image.open('./data/s3r4q8fiovza1.webp').convert("RGB") # the input image
images = pipe(
                  ref_image=init_image,
                  width=768,
                  height=512,
                  prompt=prompt,
                  negative_prompt=negative_prompt,
                  guidance_rescale=7,
                  num_inference_steps=20,
                  generator=torch.Generator(device="cuda").manual_seed(123456),
                  style_fidelity=1,
                  reference_attn=False,
                  reference_adain=True,
             )

And this is the result from webui

woman in street, masterpiece, best quality, Negative prompt: lowres, bad anatomy, bad hands, text, error, missing fingers, extra digit, fewer digits, cropped, worst quality, low quality, normal quality, jpeg artifacts, signature, watermark, username, blurry Steps: 20, Sampler: Euler a, CFG scale: 7, Seed: 123456, Size: 768x512, Model hash: e6415c4892, Model: realisticVisionV20_v20, Clip skip: 2, ControlNet 0: "Module: reference_only, Model: None, Weight: 1, Resize Mode: Crop and Resize, Low Vram: False, Threshold A: 1, Guidance Start: 0, Guidance End: 1, Pixel Perfect: False, Control Mode: Balanced, Save Detected Map: True", Version: v1.7.0

image

sayakpaul commented 9 months ago

Requesting you to tag the original authors who contributed the reference only pipelines to diffusers :)

But have you tried IP Adapters for the same purpose?

Logos23333 commented 9 months ago

@okotaku

Logos23333 commented 9 months ago

Requesting you to tag the original authors who contributed the reference only pipelines to diffusers :)

But have you tried IP Adapters for the same purpose?

I'll try IP Adapters, thank you

github-actions[bot] commented 8 months ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

sayakpaul commented 8 months ago

Closing it for inactivity.