Supporting higher resolution such as 1024px

ofirkris commented 5 years ago

Hi, is there an option to add support for higher quality output? I tried using SRgan for this, but result wasn't good enough..

yifanai commented 5 years ago

@ofirkris I believe the existing pretrained checkpoints (original author's and mine) do not support arbitrary image sizes on the fly, because of the MLP function, see: https://github.com/taki0112/UGATIT/blob/2d8596765aa766feff577850cf684190be8fb76a/UGATIT.py#L163. It contains fully connected layers, which have fixed numbers of input connections. The pretrained checkpoints are trained on 256x256 images, so:

MLP input feature map has shape (height: 64, width: 64, channels: 256) or 1,048,576 values when flattened
1st fully-connected layer in MLP has weight with the shape (in: 1,048,576, out: 256).

If the input image were 1024x1024 for example, then:

MLP input feature map would have shape (height: 256, width: 256, channels: 256)
1st fully-connected layer in MLP would have to have shape (in: 16,777,216, out: 256)
might be tough for some GPUs to handle this many parameters during training

So, it might not be possible to have 1 set of weights for all image sizes, because the shapes would conflict. There might be some workarounds:

for example, the original author's --light option, which global average pools the MLP input feature map to a standard fixed size (1, 1, 256), then the shape of the fully-connected layer weight would be (256, 256) and independent of input size.
I like your idea of using another GAN to upsample the low-res output. Might need to finetune SRGAN on anime dataset to get better looking results.
If you have a target size in mind, you might have to find high-res real-life (like https://drive.google.com/open?id=0B4qLcYyJmiz0TXY1NG02bzZVRGs from https://github.com/tkarras/progressive_growing_of_gans) and anime datasets, then finetune the pretrained checkpoint on them using the new size.

ofirkris commented 5 years ago

@yifanyf I've tested with this SRGAN implementation https://github.com/goldhuang/SRGAN-PyTorch this is trained with Anime images, also testing now with Anime4K too.

As for higher resolution, I couldn't find a good HQ dataset of Anime faces equivalent to CelebHQ and FFHQ face datasets, are you familiar with HQ anime dace datasets? (Danbooru2018 is 64*64) As for making changes to training code, can you add such? I have several V100 GPU's to train this, and can share the model once I will finish training.

yifanai commented 5 years ago

@ofirkris I was not even able to train 256px images on my GPU with original author's code, got out of memory error :cold_sweat:. I'll dig deeper to see what else can be minified for training.

Have you had success using original repo with --img_size 1024 argument on V100? For higher resolutions, I found something else that might be worth a try: https://github.com/nagadomi/waifu2x and this blog post: https://www.gwern.net/Faces

yifanai / video2anime

Supporting higher resolution such as 1024px #2