Some questions when training on my dataset

soul-M-42 commented 3 years ago

I like this repository and tried to train on a full-body portrait dataset. Here are two problems with the output of network:

The generated images looks similar tones. This problem also appeared when I train on other types of GANs, so maybe its' not the problem with the code but my dataset(
The disappearance of details. The generated images have the correct outline, but the face and clothing details disappeared. I checked the generator code and wonder if there's a skip-connection like in U-Net. As far as I know, this structure is used to improve the effect of details in the pix2pix paper.

Just a beginner in ML, どうもありがとう :)

pit-ray commented 3 years ago

I have interested in another result. Thanks.

The problem may be called Mode Collapse. Generally, there are strong restrictions in the Conditional GAN like pix2pix. Therefore, their latest spaces tend to be poor expressions. The problem has been tackled by many researchers. For example, Qi Mao. et al. preprint in arXiv:1903.05628, 2019(v6) added the diversity term into loss function to solve it.
However, this method could not solve in my dataset. The code exists in diversity_loss.py. Please try it in your dataset. By the way, I alleviated this problem by tuning the hyperparameters of the model against my 500 images, but the method is very hard.
As you say, the first proposed GAN-based pix2pix architecture adapts U-Net with skip connection. However, in pix2pixHD or GauGAN which were proposed later, context information is passed to the back layer by Residual Block well known in ResNet. Additionally, GauGAN also injects the semantic label in each layer by normalization called SPADE. Therefore, I think your problem is not the appearance of skip connections, but rather the stagnation of learning due to Mode Collapse.

Thanks.

soul-M-42 commented 3 years ago

Thanks a lot :)

pit-ray / SPADE-pix2pix-for-Anime