lucidrains / unet-stylegan2

A Pytorch implementation of Stylegan2 with UNet Discriminator
MIT License
202 stars 24 forks source link

Tuning parameters for logo dataset #6

Closed erdavis1 closed 3 years ago

erdavis1 commented 4 years ago

Hi! First off, thank you for being so engaged and active with folks' issues.

I'm training on a dataset of 3k Japanese flag logos, plus their rotations of 90, 180, and 270 degrees for 12k total images.

I generally experience very quick mode collapse, often within the first 5k iterations. The one result that it does produce is pretty nice, though! unet_stylegan2 --data jpegs/ --attn-layers [1,2] 39k iterations: image

Increasing the batch size and gradient accumulate every has reduced it to partial mode collapse. The catch is I'm training on colab, and this can be unworkably slow depending on the GPU (30 secs per iteration) unet_stylegan2 --data jpegs/ --batch-size 32 --gradient-accumulate-every 8 --cl-reg --attn-layers [1,2]

24k iterations image

Do you have any tuning suggestions for increasing speed and avoiding mode collapse? The results at 24k (above) are promising, but definitely need a lot more refining!

lucidrains commented 4 years ago

@erdavis1 you are working in the realm of small data, so i would suggest turning on augmentation! you can do so with --aug-prob 0.25

lucidrains commented 4 years ago

@erdavis1 there actually isn't any contrastive regularization in unet-stylegan2, i removed it since it doesn't fit within the unet discriminator framework

erdavis1 commented 4 years ago

@lucidrains Thanks! I'm very new to this, if it isn't obvious.

I tried the following and also got pretty quick mode collapse.

!unet_stylegan2 --data jpegs/ --attn-layers [1,2] --aug-prob 0.25 9k iterations: image

I feel like drastically increasing the batch size has given me the best results of what I've tried, but I'm really struggling to find a balance where my results are good+diverse, but it doesn't take >15secs per iteration to train.

Are there any other tricks to consider, aside from batch size?

lucidrains commented 4 years ago

Have you tried plain stylegan2, without the unet? github.com/lucidrains/stylegan2-pytorch

erdavis1 commented 4 years ago

I have. with very similar results

lucidrains commented 4 years ago

@erdavis1 try with augmentation, but with an effective batch size of 64. That means just a batch size of 32 and the gradient accumulate every of 2

erdavis1 commented 4 years ago

Thanks! I'll try running that today and report back.

erdavis1 commented 4 years ago

The following ended up in a very quick mode collapse, unfortunately: unet_stylegan2 --data jpegs/ --attn-layers [1,2] --batch-size 32 --gradient-accumulate-every 2 --aug-prob 0.25

1k: image

20k: image

For my knowledge, what differences would you expect to see in a model trained with an effective batch size of 64 (batch size 32, gradient accumulate every 2) vs. one also with effective batch size 64 (batch size 8, gradient accumulate every 8)

lucidrains commented 4 years ago

@erdavis1 i updated the augmentation code in v0.5.0 if you'd like to give that a try. --aug-prob 0.25 --aug-types [translation,cutout,color]

erdavis1 commented 4 years ago

Good news and bad news.

Using the following, I experienced less mode collapse! unet_stylegan2 --data jpegs/ --attn-layers [1,2] --batch-size 32 --gradient-accumulate-every 2 --aug-prob 0.25 --aug-types [translation,cutout,color]

It partially collapsed around 18k: image

and fully collapsed around 21k: image

The images do seem to be a bit more smeary/blurry than the test with a higher gradient-accumulate-every, but I'm not sure if that's related to that, or to the augmentation.

This is a big improvement, and ran pretty quickly at 2secs/iteration. I'm wondering if I slowly edge up the effective batch size, I'll hit upon a level where there's less collapse and clearer results without sacrificing too much speed. I am in no rush, and if I get nice results I'll wait as long as I need to.

lucidrains commented 3 years ago

@erdavis1 get more data! lol

erdavis1 commented 3 years ago

Haha, if only I could!

I ended up being pretty happy with the results of this run, even though it was fairly slow (~7 sec/iter if I got a V100)

!unet_stylegan2 --data jpegs/ --attn-layers [1,2] --batch-size 32 --gradient-accumulate-every 8 --aug-prob 0.25 --aug-types [translation,cutout,color]

Closing the issue since I'm satisfied with the output.

17k: image

lucidrains commented 3 years ago

🚀🚀