Open forever-rz opened 1 year ago
This bad visual effect became increasingly apparent later in the training. eg2: epoch:19 mask_input completed reconstruction input
Hi @forever-rz , Thanks for your interests. It seems that the second codebook is used for quantization while training transformer. For FFHQ, it will not take so long to get a reasonable inpainting results.
@liuqk3 Thanks for your help! Can you tell me how should I use just one codebook ? I only modified three parts of the transformer training configuration( batch_size,sample_iterations,save_epochs), the rest was followed by the pre-trained model configuration.This is really confusing to me.
My configuration is shown below: dataloader: batch_size: 16 data_root: data num_workers: 1 train_datasets:
model: target: image_synthesis.modeling.models.masked_image_inpainting_transformer_in_feature.MaskedImageInpaintingTransformer params: n_layer: 30 content_seq_len: 1024 n_embd: 512 n_head: 8 num_token: 512 embd_pdrop: 0.0 attn_pdrop: 0.0 resid_pdrop: 0.0 attn_content_with_mask: False mlp_hidden_times: 4 block_activate: GELU2 random_quantize: 0.3 weight_decay: 0.01 content_codec_config: target: image_synthesis.modeling.codecs.image_codec.patch_vqgan.PatchVQGAN params: ckpt_path: OUTPUT/pvqvae_ffhq/checkpoint/last.pth trainable: False token_shape: [32, 32] combine_rec_and_gt: True quantizer_config: target: image_synthesis.modeling.codecs.image_codec.patch_vqgan.VectorQuantizer params: n_e: 1024 e_dim: 256 masked_embed_start: 512 embed_ema: True get_embed_type: retrive distance_type: euclidean encoder_config: target: image_synthesis.modeling.codecs.image_codec.patch_vqgan.PatchEncoder2 params: in_ch: 3 res_ch: 256 out_ch: 256 num_res_block: 8 res_block_bottleneck: 2 stride: 8 decoder_config: target: image_synthesis.modeling.codecs.image_codec.patch_vqgan.PatchConvDecoder2 params: in_ch: 256 out_ch: 3 res_ch: 256 num_res_block: 8 res_block_bottleneck: 2 stride: 8 up_layer_with_image: true encoder_downsample_layer: conv solver: adjust_lr: none base_lr: 0.0 find_unused_parameters: false max_epochs: 250 optimizers_and_schedulers:
The strange thing is that when the mask ratio is small, like in the green circle, there are no such problems and only one codebook seems to be used, so why is it possible that two codebooks are used when the mask ratio is large (in the red circle)? What's wrong with my settings? eg epoch:8 mask_input completed
@liuqk3 Thanks for the reply, but I've carefully compared the posted model to my model training process and really don't notice any difference. So was wondering if it could be the parameters? In the training phase of pvqvae keep_ratio:[0.0,0.5] while the transformer training phase keep_ratio:[0.3,0.6] causes this?
Hi, @forever-rz . Sorry for the delayed reply.
keep_ratio
just affects the number of remained pixels in an image, it should not cause such as artifacts. After have a loot at your configs, I do not find something wrong. Here is my questions or suggestions:
1) Did you use our provided P-VQVAE or train it by yourself?
2) Have you checked the reconstruction capability of the used P-VQVAE?
3) Can you provide the cross-entropy loss curves of transformer?
Hi, @liuqk3 .I'm sorry that I temporarily put aside the experiment because I couldn't figure out the cause of this problem. Today, I carefully checked the previous experiment and found the relevant data of the three questions you raised as follows. 1 Instead of using the provided P-VQVAE, I trained a new P-VQVAE model companion and added some attention blocks. 2 The reconstruction results of my PVQVAE model are as follows FFHQ (a)input (b) mask (c) reference_input (d)reconstruction
PLACES2 (a)input (b) mask (c) reference_input (d)reconstruction IMAGENT (a)input (b) mask (c) reference_input (d)reconstruction
3 The loss of training transformer based on my P-VQVAE is as follows (ImageNet is too big, so it is given up) FFHQ: Places2
It seems to me that the reconstruction results are not bad, so I don't understand why transformer's training results are so wrong. Although I made some changes to P-VQVAE, transformer has not changed at all.
@forever-rz , I do not know the number of epochs you have trained on FFHQ and Places2. You can try to visualize the inpainting results of the trained model.
Hi, do you have any updates on this issue? I have also encountered same problem with a custom dataset, reconstruction results are much better than completed results.
I also encountered the problem of good reconstruction effect but poor generation effect in a similar task. I saw that our loss curves are basically the same. Did you solve this problem in the end?
Thanks for your contribution,but there is a problem when i train it on FFHQ.Once the ratio of mask is larger, it seems that only part of the completed result is repaired, the part that is not repaired stays black and no new content seems to be generated. Is this normal? eg1: First on the left, second on the left remain some strange black regions .(epoch:12) mask_input completed reconstruction