Tensor not on the same device for long prompts in `AuraFlowPipeline`

Describe the bug

When I generate an image with a long prompt in AuraFlowPipeline, it raises this error:

File "~/workspace/anaconda3/envs/diffusers/lib/python3.11/site-packages/diffusers/pipelines/aura_flow/pipeline_aura_flow.py", line 507, in __call__
    ) = self.encode_prompt(
        ^^^^^^^^^^^^^^^^^^^
  File "~/workspace/anaconda3/envs/diffusers/lib/python3.11/site-packages/diffusers/pipelines/aura_flow/pipeline_aura_flow.py", line 267, in encode_prompt
    if untruncated_ids.shape[-1] >= text_input_ids.shape[-1] and not torch.equal(
                                                                     ^^^^^^^^^^^^
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument other in method wrapper_CUDA__equal)

If the prompt is not that long, then it works.

Reproduction

import torch
from diffusers import AuraFlowPipeline

pipeline = AuraFlowPipeline.from_pretrained("fal/AuraFlow", torch_dtype=torch.float16).to("cuda")

image = pipeline(
    prompt="Photography, cinematic, Amazon mythological creature, giving birth, mythology, bonnacon, twinheaded bullsnakehuman, animalistichumanoid creature giving birth to animal, hybrid, half animal, bull head, luscious, shapeshifter, trickster, snake skin, mycelium, mythology, transgender, travesti, shot on portra 160, brazilian, nonbinary, mycelium garments, fantastical, Brazil, dreamy, utopic, transgender, botanical, jungle, beautiful, amazing colors, mythology hybrid creature, wide angle, mythology, folclore, full body, full perspective, soft blue and green colors, purple skin, pastel colors, hybrid, 35mm film, shot on portra 160, mythical hybrid creature, mythology, wings, jungle, louvre, plants, full perspective, Ephemerality, full length, transience, fleeting, ominous, wistful, blowing away, dreamlike, deep perspective, Super  Resolution, Advanced, photography, ultrarealistic, photo realistic, 16k, hyper realistic, cinematic lighting, intricate, realism, maximalist detail, octane render, Artstation, extreme high render ",
    height=1024,
    width=1024,
    num_inference_steps=50,
    generator=torch.Generator().manual_seed(666),
    guidance_scale=3.5,
).images[0]

image.save("tmp.png")

If the prompt is short, like

import torch
from diffusers import AuraFlowPipeline

pipeline = AuraFlowPipeline.from_pretrained("fal/AuraFlow", torch_dtype=torch.float16).to("cuda")

image = pipeline(
    prompt="photo of a cat",
    height=1024,
    width=1024,
    num_inference_steps=50,
    generator=torch.Generator().manual_seed(666),
    guidance_scale=3.5,
).images[0]

then it works.

Logs

File "~/workspace/anaconda3/envs/diffusers/lib/python3.11/site-packages/diffusers/pipelines/aura_flow/pipeline_aura_flow.py", line 507, in __call__
    ) = self.encode_prompt(
        ^^^^^^^^^^^^^^^^^^^
  File "~/workspace/anaconda3/envs/diffusers/lib/python3.11/site-packages/diffusers/pipelines/aura_flow/pipeline_aura_flow.py", line 267, in encode_prompt
    if untruncated_ids.shape[-1] >= text_input_ids.shape[-1] and not torch.equal(
                                                                     ^^^^^^^^^^^^
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument other in method wrapper_CUDA__equal)

System Info

Python 3.11.9
CUDA 12.2
torch 2.3.1
diffusers 0.30.0.dev0
Platform: Ubuntu 20.04.5 LTS

Who can help?

@yiyixuxu

huggingface / diffusers

Tensor not on the same device for long prompts in `AuraFlowPipeline` #8935

Describe the bug

Reproduction

Logs

System Info

Who can help?