taesungp / swapping-autoencoder-pytorch

Official Implementation of Swapping Autoencoder for Deep Image Manipulation (NeurIPS 2020)
Other
516 stars 84 forks source link

structure not being retained #27

Open haiderasad opened 2 years ago

haiderasad commented 2 years ago

hi @taesungp great piece of work, I trained it on my dataset of 50k images for 50 Mil iterations as you suggested, on testing time the results are quite impressive but in some cases, the structure is not being correctly reconstructed, I would like that the shapes be generated almost the same(full swapping). What can be the problem will training further help? image

haiderasad commented 2 years ago

UPDATE: i also ran the above test on 25 million iter model, in that result the model is more clearly and exactly recreating the structure images, a lot better than the 50 million trained model attaching the zip file for analysis. Could you please tell what could be the reason that the 50 mill model is recreating the structure poorly as compared to 25mill model

https://drive.google.com/file/d/1PszJfCIoKxRKFAS4zEEv9td5hVTgD2VG/view?usp=sharing

tom99763 commented 2 years ago

hi @taesungp great piece of work, I trained it on my dataset of 50k images for 50 Mil iterations as you suggested, on testing time the results are quite impressive but in some cases, the structure is not being correctly reconstructed, I would like that the shapes be generated almost the same(full swapping). What can be the problem will training further help? image

  1. I think stop gradient to structure code in the translation path maybe help this issue

  2. See Figure 6 in the paper, they discuss the effect of patch size selection, maybe use smaller patch size can solve the geometry changing issue

taesungp commented 2 years ago

Hi @tom99763, I suppose it's because the patch discriminator becomes stronger during the course of training, encouraging the generator to make more changes. You can try the following two things:

  1. Make the patch size smaller as you suggested. i.e. reduce --patch_min_scale, --patch_max_scale, and --patch_size.
  2. Reduce the number of downsampling steps in the generator to make the structure code larger. i.e. reduce --netE_num_downsampling_sp
tom99763 commented 2 years ago

Hi @tom99763, I suppose it's because the patch discriminator becomes stronger during the course of training, encouraging the generator to make more changes. You can try the following two things:

  1. Make the patch size smaller as you suggested. i.e. reduce --patch_min_scale, --patch_max_scale, and --patch_size.
  2. Reduce the number of downsampling steps in the generator to make the structure code larger. i.e. reduce `--netE_num_downsampling_sp

Another question, if there's an auxiliary classifier in the discriminator, for example, classify facial expressions,can the strcture be retained(reconstruction) if using less downsample and smaller patches?

Seems like the biggest contribution is the patch discriminator in this paper, have you ever try constrative loss or triplet loss in the discriminator? It maybe has the potential to extend to the combination of hard recognition task and generation task

taesungp commented 2 years ago

Having an auxiliary classifier is a good idea. I think you are right.

I did think about contrastive loss or triplet loss, and that was our long term plan in case the current formulation does not work out. The concern was that it will be quite memory intensive, because you will need to encode images, swap the codes, decode them, and the re-encode to compute the contrastive loss. Fortunately the current formulation seemed to have enough inductive bias to generate interesting outputs.