Hard cutoff straight lines/boxes of nothing in generated images

lucidrains / lightweight-gan

Implementation of 'lightweight' GAN, proposed in ICLR 2021, in Pytorch. High resolution image generations that can be trained within a day or two

MIT License

1.63k stars 222 forks source link

Hard cutoff straight lines/boxes of nothing in generated images #107

Closed timendez closed 2 years ago

timendez commented 2 years ago

Hello! Training on Google Colab with

!lightweight_gan --data my/images/ --name my-name --image-size 256 --transparent --dual-contrast-loss --num-train-steps 250000

I'm at 250k iterations over the course of 5 days at 2s/it, and have gotten strange results with boxes.

I've circled some examples of this below.

My training data is 22k images of 256x256 .pngs that do not contain large hard edges or boxes like this. They're video game sprites with hard edges being limited to at most 10x10px

Are there any suggestions I can do with arguments in order to decrease the chance of the models learning that transparent boxes are good? Would converting to a white background help?

Thank you!

99991 commented 2 years ago

What you are seeing are the cropout augmentations. For RGB images, the cropout rectangles are black (0, 0, 0): https://github.com/lucidrains/lightweight-gan#basic-usage

For RGBA images, the cropout rectangles are transparent.

It might help to reduce the augmentation probability, e.g. set --aug_prob=???? to some value, although I have not tried it and do not know which values are good.

Another point of reference: I trained on a dataset of 2500 natural images of size 64x64 with default settings. After 30k iterations, the cropout augmentations already became apparent, so less augmentation is definitely needed, but for me that would probably lead to mode collapse because the dataset is relatively small. But your dataset is much larger, so it might work.

timendez commented 2 years ago

Thank you for the explanation @99991! That makes a lot of sense.

It seems unusual to me that [cutout,translation] are enabled by default. Would you happen to know why, and what the repercussions are for removing all augmentations? Does it simply help for more variety in outputs? I have no need to generate any images with cutouts or translations, so my plan is to run it with no augmentations.

General recommendation is using suitable augs for your data and as many as possible, then after sometime of training disable most destructive (for image) augs.

This part of the README makes me think that they do help a lot with training, and that disabling them later on can help the training get "back on track" for more realistic output, while still being able to use the benefits of variety offered by augmentations. Does that sound right?

99991 commented 2 years ago

The problem is that with too little data, the variety of the generated images will "collapse" eventually. With augmentations, the size of the dataset is artificially increased, which seems to work around this problem in practice, but the augmentations could leak into the generated images.

For more details, see for example "Differentiable Augmentation for Data-Efficient GAN Training" https://arxiv.org/pdf/2006.10738.pdf

I had more success with StyleGAN2-ADA-PyTorch because it worked well right out of the box without parameter tuning.

timendez commented 2 years ago

I tried StyleGAN2-ADA-PyTorch based off of your suggestion and also had great success!

Thank you so much @99991 you've saved me a lot of time!