With CelebA, we cropped the center 178x178 of the images, then resized them to 256x256 using bilinear interpolation. For Paris StreetView, since the images in the dataset are elongated (936 x 537), we separate each image into three: 1) Left 537 x 537, 2) middle 537 x 537, 3) right 537 x 537, of the image. These images are scaled down to 256x256 for our model, totaling 44; 700 images.
And after little test, I feel this number has a big impact on the results.
Here is what the paper describe.
And after little test, I feel this number has a big impact on the results.
So, maybe you have some experience about it.
Could you share it? I really appreciate it.