valeoai / Maskgit-pytorch

MIT License
145 stars 15 forks source link

Question about the DataPreprocessing when training ImageNet #14

Closed RobertLuo1 closed 2 months ago

RobertLuo1 commented 3 months ago

Hi, Thanks for your wonderful work! I notice that maskgit uses RandomResizedandCrop for data augmentation. But I find that in the code you adopt Cropping and flipping and comment out the Normalization (since I think VQGAN is trained with normalization). https://github.com/valeoai/Maskgit-pytorch/blob/b0b2b3cc11cffd0b159f22dc1c6e73a7e8b53db3/Trainer/trainer.py#L80 I am curious about the reason behind.

Thanks in advance!

llvictorll commented 3 months ago

Hello RobertLuo1, I haven't done any data augmentation search, but the "transforms.Resize" will set the smallest border of the image to the target size, then the "transforms.RandomCrop" will randomly crop to the target size (along the longest border). This procedure is, if I am not mistaken, the correct way to preserve the image ratio and do the "RandomResizeAndCrop". Indeed, I also use random flipping as it is a common data augmentation technique. I commented the normalization because the VQGAN take input in the range [-1, 1] and I do it manually here

Best,

Victor

RobertLuo1 commented 3 months ago

Thanks a lot! I see there exists a similar data preprocessing method in Pytorch called RandomResizedandCrop. I wonder if it is utilized in Maskgit? But after your detailed explanation, I believe what you say is the RandomResizeAndCrop. Thank you, Victor.

llvictorll commented 3 months ago

Hi! Actually, I was not aware that RandomResizedandCrop is available directly from Pytorch... It might work, but be aware of the ratio aspect, otherwise the model will also generated "distorted" images.
About the official MaskGIT, since they do not realize the training code, I can not help you, sorry. Best, Victor

RobertLuo1 commented 3 months ago

Yeah, but the inner function of RandomResizedandCrop is cropped first then Resized. I think it is not quite the same with the current operation. I think the more feasible way is still using Resize and RandomCrop which is utilized in VQGAN Repo