tencent-ailab / IP-Adapter

The image prompt adapter is designed to enable a pretrained text-to-image diffusion model to generate images with image prompt.

Apache License 2.0

4.46k stars 289 forks source link

random crop #357

Open masaisai111 opened 1 month ago

masaisai111 commented 1 month ago

Why crop the image with noise, but do not crop the reference image when extracting features, which will not result in feature mismatch

random crop

    delta_h = image_tensor.shape[1] - self.size
    delta_w = image_tensor.shape[2] - self.size
    assert not all([delta_h, delta_w])

    if self.center_crop:
        top = delta_h // 2
        left = delta_w // 2
    else:
        top = np.random.randint(0, delta_h + 1)
        left = np.random.randint(0, delta_w + 1)
    image = transforms.functional.crop(
        image_tensor, top=top, left=left, height=self.size, width=self.size
    )

xiaohu2015 commented 1 month ago

in fact, we only use center crop during training.

masaisai111 commented 1 month ago

There are some differences between the photos obtained after clipping and the feature pictures extracted by clip, for example, the edge area is clipped off. Then why can the photos obtained after clipping be used for noise pictures? Won't there be some information conflict

xiaohu2015 commented 1 month ago

what do you mean "clipping"

masaisai111 commented 1 month ago

4f2871fcc3c536b10879e6a409b059ec this，If the size of the training data I set is not square, the phenomenon of cropping will occur when training XL, and the edge information of the picture will be cropped，