huggingface / diffusers

🤗 Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch and FLAX.
https://huggingface.co/docs/diffusers
Apache License 2.0
25.12k stars 5.19k forks source link

Aspect Ratios in Auraflow pipeline #9001

Open Muawizodux opened 1 month ago

Muawizodux commented 1 month ago

Describe the bug

passing Different Height or Width other than 1024 by 1024 leads to an error

Reproduction

from diffusers import AuraFlowPipeline
import torch

pipeline = AuraFlowPipeline.from_pretrained(
    "fal/AuraFlow-v0.2",
    torch_dtype=torch.float16,
    variant="fp16",
).to("cuda")

image = pipeline(
    prompt="close-up portrait of a majestic iguana with vibrant blue-green scales, piercing amber eyes, and orange spiky crest. Intricate textures and details visible on scaly skin. Wrapped in dark hood, giving regal appearance. Dramatic lighting against black background. Hyper-realistic, high-resolution image showcasing the reptile's expressive features and coloration.",
    height=1344,
    width=768,
    num_inference_steps=50, 
    generator=torch.Generator().manual_seed(666),
    guidance_scale=3.5,
).images[0]

image.save("/mnt/additional-disk/home/ubuntu/umar_muawiz/gitclonerepo/ml-realtime-controlimg2img/assets/output_at_16x9.png")

Logs

Traceback (most recent call last):
  File "/mnt/additional-disk/home/ubuntu/umar_muawiz/Auraflow.py", line 10, in <module>
    image = pipeline(
  File "/mnt/additional-disk/home/ubuntu/umar_muawiz/.envs/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/mnt/additional-disk/home/ubuntu/umar_muawiz/.envs/lib/python3.10/site-packages/diffusers/pipelines/aura_flow/pipeline_aura_flow.py", line 555, in __call__
    noise_pred = self.transformer(
  File "/mnt/additional-disk/home/ubuntu/umar_muawiz/.envs/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/mnt/additional-disk/home/ubuntu/umar_muawiz/.envs/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/mnt/additional-disk/home/ubuntu/umar_muawiz/.envs/lib/python3.10/site-packages/diffusers/models/transformers/auraflow_transformer_2d.py", line 337, in forward
    hidden_states = self.pos_embed(hidden_states)  # takes care of adding positional embeddings too.
  File "/mnt/additional-disk/home/ubuntu/umar_muawiz/.envs/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/mnt/additional-disk/home/ubuntu/umar_muawiz/.envs/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/mnt/additional-disk/home/ubuntu/umar_muawiz/.envs/lib/python3.10/site-packages/diffusers/models/transformers/auraflow_transformer_2d.py", line 78, in forward
    return latent + self.pos_embed
RuntimeError: The size of tensor a (4032) must match the size of tensor b (4096) at non-singleton dimension 1

System Info

- 🤗 Diffusers version: 0.30.0.dev0
- Platform: Linux-5.15.0-52-generic-x86_64-with-glibc2.31
- Running on Google Colab?: No
- Python version: 3.10.14
- PyTorch version (GPU?): 2.2.1+cu121 (True)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
- Huggingface_hub version: 0.24.2
- Transformers version: 4.43.3
- Accelerate version: 0.33.0
- PEFT version: not installed
- Bitsandbytes version: not installed
- Safetensors version: 0.4.2
- xFormers version: 0.0.25
- Accelerator: NVIDIA GeForce RTX 4090, 24564 MiB
- Using GPU in script?: yes
- Using distributed or parallel set-up in script?: No

Who can help?

@yiyixuxu @DN6

bghira commented 1 month ago

yes, only square images are supported. maybe the pipeline should error out or enforce this.

Skquark commented 1 month ago

Not just square aspect ratio, but anything other than 1024x1024 isn't supported. Wondering if there's plans to support other sizes, or if we just need to accept the limit? I've been really liking AuraFlow aesthetics but square only is restrictive...