huggingface / diffusers

🤗 Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch and FLAX.
https://huggingface.co/docs/diffusers
Apache License 2.0
26.22k stars 5.4k forks source link

StableDiffusionReferencePipeline throws list index out of range error #4306

Closed ethankongee closed 1 year ago

ethankongee commented 1 year ago

Describe the bug

StableDiffusionReferencePipeline throws list index out of range error when you have batch_size over 1 and guidance_scale over 1.

Here's how I'm trying to call the pipeline:

generator = [torch.Generator(device="cuda").manual_seed(random.randint(1, 250)) for i in range(4)]
pipe(
        prompt=[prompt]*4,
        ref_image=image,
        width=512,
        height=512,
        style_fidelity=0.5,
        num_inference_steps=50,
        generator=generator,
        negative_prompt=[negative_prompt] * 4,
        reference_attn=True,
        reference_adain=False,
    )

So I'm trying to generate 4 images per batch and by default, guidance_scale is 7.5. However, this line would throw the error:

latents = [
            torch.randn(shape, generator=generator[i], device=rand_device, dtype=dtype, layout=layout)
            for i in range(batch_size)
        ]

When I inspect the value, batch_size is 8 instead of 4. And I traced it back to this line:

ref_image_latents = torch.cat([ref_image_latents] * 2) if do_classifier_free_guidance else ref_image_latents

So the code above is trying to double ref_image_latents and it breaks because the size of ref_image_latents and the size of generator are not the same anymore.

Reproduction

from diffusers import StableDiffusionPipeline
import torch
import random

pipe = StableDiffusionPipeline.from_pretrained(
    "runwayml/stable-diffusion-v1-5",
    torch_dtype=torch.float16,
    custom_pipeline="stable_diffusion_reference",
).to('cuda')

generator = [torch.Generator(device="cuda").manual_seed(random.randint(1, 250)) for i in range(4)]
pipe(
        prompt=[prompt]*4,
        ref_image=image,
        width=512,
        height=512,
        style_fidelity=0.5,
        num_inference_steps=50,
        generator=generator,
        negative_prompt=[negative_prompt] * 4,
        reference_attn=True,
        reference_adain=False,
    )

Logs

No response

System Info

Who can help?

@patrickvonplaten @sayakpaul @williamberman

patrickvonplaten commented 1 year ago

cc @DN6 can you take a look here? :-)

DN6 commented 1 year ago

Also cc'ing : @okotaku since they added the pipeline.

@ethank852 You're right, the change in the size of ref_image_latents is causing an issue when adding noise to them here: https://github.com/huggingface/diffusers/blob/5fd3dca5f377126b73a9af8aaf7a6291951d201c/examples/community/stable_diffusion_reference.py#L726-L728 Since there is mismatch in the length of generator and ref_image_latents.

I would recommend moving the logic for classifier free guidance with ref_image_latents to the diffusion loop, so that it's explicit when the length is changed

e.g.

for i, t in enumerate(timesteps):
    # expand the latents if we are doing classifier free guidance
    latent_model_input = torch.cat([latents] * 2) if do_classifier_free_guidance else latents
    latent_model_input = self.scheduler.scale_model_input(latent_model_input,

    # ref only part
    noise = randn_tensor(
        ref_image_latents.shape, generator=generator, device=device, dtype=ref_image_latents.dtype
    )
    ref_xt = self.scheduler.add_noise(
                    ref_image_latents,
                    noise,
                    t.reshape(
                        1,
                    ),
                )
    ref_xt = torch.cat([ref_xt] * 2) if do_classifier_free_guidance else ref_xt

@okotaku could you confirm if this change is compatible with the pipeline? And would you be able to open a PR for it if that's the case?

DN6 commented 1 year ago

Hi @ethank852. I pushed a fix for this issue in this PR: #4531 Can you verify it works as expected? I'm closing the issue for now. LMK if you run into any further issues.