Image Normalization - Githubissues

MnLgt commented 2 months ago

Hi,

I really love IP-Adapter! I'm wondering why you chose to normalize the image with 0.5

self.transform = transforms.Compose([
            transforms.Resize(self.size, interpolation=transforms.InterpolationMode.BILINEAR),
            transforms.CenterCrop(self.size),
            transforms.ToTensor(),
            transforms.Normalize([0.5], [0.5]),
        ])

instead of the clip normalization of

"image_mean": [
    0.48145466,
    0.4578275,
    0.40821073
  ],
  "image_std": [
    0.26862954,
    0.26130258,
    0.27577711
  ],

The reason I ask is that I'm looking to train an IP-Adapter Plus using DinoV2 as the image encoder and I'm not sure whether to use the normalization used in the tutorial train plus of 0.5, the standard CLIP normalization or the DinoV2 normalization of

"image_mean": [
    0.485,
    0.456,
    0.406
  ],
  "image_processor_type": "BitImageProcessor",
  "image_std": [
    0.229,
    0.224,
    0.225
  ],

Thanks so much.

xiaohu2015 commented 2 months ago

the transform function of https://github.com/tencent-ailab/IP-Adapter/blob/main/tutorial_train.py#L43 is for VAE of SD. for clip, we use https://github.com/tencent-ailab/IP-Adapter/blob/main/tutorial_train.py#L49

MnLgt commented 2 months ago

Ah, of course. I was getting them mixed up. Thank you.

tencent-ailab / IP-Adapter

Image Normalization #342