tencent-ailab / IP-Adapter

The image prompt adapter is designed to enable a pretrained text-to-image diffusion model to generate images with image prompt.
Apache License 2.0
4.46k stars 289 forks source link

Image Normalization #342

Closed MnLgt closed 2 months ago

MnLgt commented 2 months ago

Hi,

I really love IP-Adapter! I'm wondering why you chose to normalize the image with 0.5

self.transform = transforms.Compose([
            transforms.Resize(self.size, interpolation=transforms.InterpolationMode.BILINEAR),
            transforms.CenterCrop(self.size),
            transforms.ToTensor(),
            transforms.Normalize([0.5], [0.5]),
        ])

instead of the clip normalization of

"image_mean": [
    0.48145466,
    0.4578275,
    0.40821073
  ],
  "image_std": [
    0.26862954,
    0.26130258,
    0.27577711
  ],

The reason I ask is that I'm looking to train an IP-Adapter Plus using DinoV2 as the image encoder and I'm not sure whether to use the normalization used in the tutorial train plus of 0.5, the standard CLIP normalization or the DinoV2 normalization of

"image_mean": [
    0.485,
    0.456,
    0.406
  ],
  "image_processor_type": "BitImageProcessor",
  "image_std": [
    0.229,
    0.224,
    0.225
  ],

Thanks so much.

xiaohu2015 commented 2 months ago

the transform function of https://github.com/tencent-ailab/IP-Adapter/blob/main/tutorial_train.py#L43 is for VAE of SD. for clip, we use https://github.com/tencent-ailab/IP-Adapter/blob/main/tutorial_train.py#L49

MnLgt commented 2 months ago

Ah, of course. I was getting them mixed up. Thank you.