Open leodvd opened 4 years ago
What is accuracy when you train stylegan2 with segmentation maps jointly with the RGB images @leodvd I tried this but accuracy is not good
@daisy220296 I'm having a hard time too! I was able to obtain stable trainings and generated very good-looking images when working with RGB images only (with --aug-prob 0.3 --dataset-aug-prob 0.6, and my dataset of 5000 images). But now with the segmentation maps jointly to the RGB images, the training is very unstable!
The 4th channel (segmaps) seems to make it easier for the discriminator to discriminate fakes from reals. Its loss tends to be way lower than usual. And it probably gives the generator a hard time. The segmentation maps are quite uniform and contain only 0 and 255 values, and I think it pushes the generator to create smoother images (even on the other 3 channels, which actually need to be more detailed).
On my dataset, the results started to look promising after about 20000 iterations (the images started to look kind of realistic and the segmentation maps were making sense) but then, before the results could improve and become good enough, the discriminator loss steadily dropped to zero and the outputs became full black images once and for all. I am playing with the data augmentation parameters again to see if that collapse can be prevented!
@leodvd were you able to resolve this? Working on a similar problem.
@pcicales That was a while ago so I've forgotten if it was with the StyleGAN2 or some other GAN, but I eventually did manage to solve the issue by making the 4th (segmap) channel more similar to the 3 other (RGB) channels. So, if the objects to segment are rather dark (resp. light) in the images try using a dark (resp. light) grey value for the segmentation on the 4th channel, and do the same for the background (dark if the background is dark in the image, or light if it's light in the image). Improving such coherence among the channels stabilized my trainings and I could get outputs almost as good as the RGB-only ones. Hope this helps!
Thanks so much @leodvd ! So if I am understanding correctly, you project the colorspace onto the mask; did you have a specific approach to accomplish this? How did you restore the mask to binary once generated?
@leodvd my current solution is to encode the mask using the 4 channel mean, and then decode the generated mask later using the generated rgb pixel values. Thanks a lot for your help!
Thanks for your great code which has been working well on my datasets!
Now, I'd like to generate B&W segmentation maps jointly with the RGB images. What I'd usually do is concatenate the images and segmentation maps and create 4 channels images in the dataloading part of the code, and then save the first 3 channels and the 4th channel of the outputs as separate images.
Would there be an easy way to make it possible to input 2 different folders containing the images to be concatenated (and the number of channel of the 2 different types of images) and have both outputs saved separately? Or should I rather use the --transparent option and try turning my images into a 4-channel dataset and separate the outputs after they are saved?
Thanks again!