liuqk3 / PUT

Paper 'Transformer based Pluralistic Image Completion with Reduced Information Loss' in TPAMI 2024 and 'Reduce Information Loss in Transformers for Pluralistic Image Inpainting' in CVPR2022
MIT License
173 stars 15 forks source link

issues with training #2

Closed fei1998 closed 2 years ago

fei1998 commented 2 years ago

Thanks for sharing your great work! During training, i meet the issues like following figures. The loss remains 0 and the reconstruction image is wrong. How to solve them? Thanks! Best wishes! problem1 problem2

liuqk3 commented 2 years ago

Hi @fei1998 ,

Have you trained P-VAVAE on your dataset? The reconstruction of input data is something wrong. Maybe you can provide your training logs of P-VQVAE.

fei1998 commented 2 years ago

Thanks a lot! P-VQVAE on my dataset is shown as the following figures. It seems to be wrong? problemaa problembb Best wishes!

liuqk3 commented 2 years ago

Hi @fei1998 ,

The curves are some different with mines. According to my experience, the reference image should be mostly masked. It is because the reference branch in the decoder is very easy to be learned. However, the codebooks, the encoder and the left part in the decoder are much harder to be learned. So my suggestion is that increase the mask ratio of the training data and retrain P-VQVAE.

By the way, the reference branch is infact to increase the overall quality of the model. You can even remove it. If you do so, the output image is totally reconstructed with encoded quantize tokens. This is very helpful to check if current configuration of P-VQVAE suitable for your dataset.

Best wishes.

fei1998 commented 2 years ago

Hi, Thanks a lot! Let me try again. Would you like to provide the pretrained P-VQVAE model on ImageNet? Maybe I can finetune it on my dataset. Best wishes.

liuqk3 commented 2 years ago

Hi @fei1998 , the pretrained models are all provided since P-VQVAE is a sub-module of PUT. You can get it from our provided model with model.content_codec.