huggingface / diffusers

🤗 Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch and FLAX.
https://huggingface.co/docs/diffusers
Apache License 2.0
25.03k stars 5.17k forks source link

[LCM-SDXL][XLA] RuntimeError: Input type (float) and bias type (c10::Half) should be the same #5805

Closed kevint324 closed 7 months ago

kevint324 commented 9 months ago

Describe the bug

Running sample code from https://huggingface.co/latent-consistency/lcm-sdxl with a little bit XLA adaption got error

The error remains the same regardless of using TPU or CPU as backend.

Details are in the colab.

I cannot figure out why the input type is float. Need some light.

Reproduction

https://colab.research.google.com/drive/19Rk2jAzyvoHqMT0-qzmel3Ui24r6CcQZ?usp=sharing

Logs

RuntimeError                              Traceback (most recent call last)
<ipython-input-6-a7e5bf190bfc> in <cell line: 3>()
      1 prompt = "a close-up picture of an old man standing in the rain"
      2 
----> 3 image = pipe(prompt, num_inference_steps=4, guidance_scale=8.0).images[0]

13 frames
/usr/local/lib/python3.10/dist-packages/diffusers/models/lora.py in forward(self, hidden_states, scale)
    226             # make sure to the functional Conv2D function as otherwise torch.compile's graph will break
    227             # see: https://github.com/huggingface/diffusers/pull/4315
--> 228             return F.conv2d(
    229                 hidden_states, self.weight, self.bias, self.stride, self.padding, self.dilation, self.groups
    230             )

RuntimeError: Input type (float) and bias type (c10::Half) should be the same

System Info

Copy-and-paste the text below in your GitHub issue and FILL OUT the two last points.

Who can help?

No response

yiyixuxu commented 9 months ago

can you provide a reproducible script?

Thanks

YiYi

kevint324 commented 9 months ago
from diffusers import DiffusionPipeline, UNet2DConditionModel, LCMScheduler
from diffusers import AutoencoderKL
import torch
import torch_xla.core.xla_model as xm

unet = UNet2DConditionModel.from_pretrained("latent-consistency/lcm-sdxl", torch_dtype=torch.float16, variant="fp16")
pipe = DiffusionPipeline.from_pretrained("stabilityai/stable-diffusion-xl-base-1.0", unet=unet, torch_dtype=torch.float16, variant="fp16")

pipe.scheduler = LCMScheduler.from_config(pipe.scheduler.config)

device = xm.xla_device()
pipe.to(device)

prompt = "a close-up picture of an old man standing in the rain"

image = pipe(prompt, num_inference_steps=4, guidance_scale=8.0).images[0]

Hi @yiyixuxu
This is the script. Also you can reproduce it in the colab link.

Thanks

yiyixuxu commented 9 months ago

cc @patil-suraj @luosiallen here

patrickvonplaten commented 9 months ago

@kevint324 - does fp16 work on XLA? Are you working on TPU?

kevint324 commented 9 months ago

Hi @patrickvonplaten

Yes, FP16 is permitted on XLA device. https://github.com/pytorch/pytorch/commit/e2e9d1572617a151ba04e086ce8baa171696fa2a

I'm working on a GPU like accelerator. This error pops up before entering the device lowering stage. And the symtoms is the same across CPU/TPU/GPU so I guess it's about XLA device layer and hardware backend agnostic.

Thanks

patrickvonplaten commented 9 months ago

We should maybe look a bit more into XLA here

EdoardoBotta commented 9 months ago

I had a similar issue with the AnimateDiff pipeline. On GPU/CPU, I was able to mitigate it by wrapping the pipe() call in this context:

with torch.autocast(device):
    image = pipe(prompt, num_inference_steps=4, guidance_scale=8.0).images[0]

However, this workaround does not seem to work on XLA:

RuntimeError: User specified an unsupported autocast device_type 'xla:0'
github-actions[bot] commented 8 months ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.