huggingface / diffusers

🤗 Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch and FLAX.
https://huggingface.co/docs/diffusers
Apache License 2.0
25.03k stars 5.17k forks source link

Bugs in PixArtImageProcessor.resize_and_crop_tensor #8911

Open MagiaSN opened 1 month ago

MagiaSN commented 1 month ago

Describe the bug

PixArtImageProcessor.resize_and_crop_tensor returns image with incorrect size

Reproduction

import torch
from diffusers.image_processor import PixArtImageProcessor

orig_height, orig_width = (832, 1152)
new_height, new_width = (960, 1280)

image = torch.rand((1, 3, orig_height, orig_width))
image = PixArtImageProcessor.resize_and_crop_tensor(image, new_width, new_height)
print(image.size())

# expected: torch.Size([1, 3, 960, 1280])
# actual: torch.Size([1, 3, 1, 1280])

Logs

No response

System Info

diffusers==0.30.0.dev0

Who can help?

No response

MagiaSN commented 1 month ago

Analysis:

@staticmethod
def resize_and_crop_tensor(samples: torch.Tensor, new_width: int, new_height: int) -> torch.Tensor:
    orig_height, orig_width = samples.shape[2], samples.shape[3]

    # orig_height = 832
    # orig_width = 1152
    # new_height = 960
    # new_width = 1280

    # Check if resizing is needed
    if orig_height != new_height or orig_width != new_width:
        ratio = max(new_height / orig_height, new_width / orig_width)
        resized_width = int(orig_width * ratio)
        resized_height = int(orig_height * ratio)

        # ratio = 1.1538461538461537
        # resized_height = 959
        # resized_width = 1329

        # Resize
        samples = F.interpolate(
            samples, size=(resized_height, resized_width), mode="bilinear", align_corners=False
        )

        # Center Crop
        start_x = (resized_width - new_width) // 2
        end_x = start_x + new_width
        start_y = (resized_height - new_height) // 2
        end_y = start_y + new_height
        # start_x = 24
        # end_x = 1304
        # start_y = -1
        # end_y = 959
        samples = samples[:, :, start_y:end_y, start_x:end_x]
        # samples[:, :, -1:959, 24:1304]

    return samples

So I think this is caused by int(orig_height * ratio) = int(832 * 1.1538461538461537) = int(959.9999999999999) = 959, but we want 960. Maybe we should use round, or just set resized_height = new_height if it has larger ratio.