tencent-ailab / IP-Adapter

The image prompt adapter is designed to enable a pretrained text-to-image diffusion model to generate images with image prompt.
Apache License 2.0
5.08k stars 331 forks source link

Confusion about image data normalization while training #312

Open Exuan148 opened 6 months ago

Exuan148 commented 6 months ago

Hi, I notice that there are data normalization operations in the MyDataset class, but I am quite confused that if the 'self.transform' operation is the same with nomalization of original SD. Since if we freeze SD unet, the normalization's mean and std should not change, right? Also, could you please tell me how can I denormalize the output of 'self.transform' and see the denormalized image? 您好,我看到在预处理target image和image prompt的时候,你们分别用到了如下两个方法:transforms.Compose和CLIPImageProcessor, 也就是说,SD unet是用transforms.Compose归一化后的target image的VAE feature来加噪和采样的,那本文transforms.Compose的均值和方差和原始SD训练时归一化的均值方差一样吗?如果没有调整unet的参数,那这个归一化的均值方差应该是一样的吧。还有,我想问一下,如何把transforms.Compose处理得到的张量去归一化保存成图像文件,有相应的方法吗?

self.transform = transforms.Compose([ transforms.Resize(self.size, interpolation=transforms.InterpolationMode.BILINEAR), transforms.CenterCrop(self.size), transforms.ToTensor(), transforms.Normalize([0.5], [0.5]), ]) self.clip_image_processor = CLIPImageProcessor()

xiaohu2015 commented 6 months ago

这里的用的是CLIP Image encoder,所以也用它对应的图像处理器

Exuan148 commented 6 months ago

对,我想问的是SD加噪声用的图片的图片处理器,它的归一化方法是否和SD原来训练用的一样呢