the qualitative comparison with PUTconv in figure 7

CyrilCsy commented 2 years ago

I am very curious why there is such a big gap with the general CNN-based encoder for results. CNN should be able to learn to distinguish the masked region to a certain extent.

liuqk3 commented 2 years ago

Hi @CyrilCsy,

Thanks for your interests in our work. CNN-based encoder indeed can learn a good feature. However, the feature is not suitable for UQ-Transformer (it is good for reconstruction). The reason is that the masked regions (zero pixels) will have a negative impact on other unmasked regions. For PUT, the main artifacts is that a patch is easily to be predicted as black (zero pixels) if: 1) a partially masked patch contain some black pixels; 2) lots of black pixels in unmasked regions. The CNN-based convolution will transfer the black pixels from masked region to unmasked region, which will have a significant negative impact on the inpainted images.

By the way, I also have tried to fix this artifact. The next version of PUT is on the way.

CyrilCsy commented 1 year ago

Thanks for answering my confusion. I'm trying to train the model in Places, and I set batchsize=64 and keep epoch unchanged(100) when training pvqvae. But it shows that training takes more than 20 days, I wonder if this is necessary？If I just train it 10 epoch, will it make a big difference in the effect?

liuqk3 commented 1 year ago

Hi @CyrilCsy ,

According to my experience, P-VQVAE can achieve a promising reconstruction capability when the number of epochs are reduced. But you need to pay attention to some settings. For example, the number of iterations for warming up, the number of iterations when some losses are introduced (Discriminator, LPIPS, etc). You'd better set these number of iterations according to ratio of total number of iterations.

liuqk3 / PUT

the qualitative comparison with PUTconv in figure 7 #6