tamarott / SinGAN

Official pytorch implementation of the paper: "SinGAN: Learning a Generative Model from a Single Natural Image"
https://tamarott.github.io/SinGAN.htm
Other
3.31k stars 611 forks source link

Does it support 1024px #52

Closed wodsoe closed 2 years ago

wodsoe commented 4 years ago

Does it support 1024px? let --max_size=1024?

tamarott commented 4 years ago

Should work (up to your GPU memory).

JonathanFly commented 4 years ago

If you can get 1024px output please update this with what hardware you did it on. I can't get much over 600px with a 16GB GPU. I would love to implement some form gradient checkpointing, I really don't care how long it takes to train if I can get over 1000+ pixel output.

singulart commented 4 years ago

This size didn't work for my GPU (1080ti with 11GB VRAM).

majdzr commented 4 years ago

Any tips on how to train up to 1000px on a 4 GPU machine? I tried wrapping everything in nn.DataParallel without any luck. Thanks!

Lotayou commented 4 years ago

@JonathanFly @majdzr @singulart Guys, I think there are two places we can tune down the memory usage. These would compromise the performance, but at least it's better than getting stuck.

  1. Tuning down the number of filters slightly (e.g. from 32 to 16),
  2. Tuning down the number of layers for each scale (by changing opt.num_layers)
JonathanFly commented 4 years ago

@Lotayou That's very exciting, have you tested any of these?

I did note that GPU memory use tends to spike between scales, and then fall back down during most of the training. I didn't look into the details but I was wondering if were possible to move whatever was spiking to CPU training and transfer back to GPU for the regular stuff. I have seen libraries that can do this in tensorflow but I don't know if it's possible in PyTorch.

Lotayou commented 4 years ago

I've just tested a 256256 image, takes 10 scales 20000 epochs, and the maximum GPU usage is around 4GB, so I presume at least 16 GB is required for 10241024. I've also encountered the spiking issue you mentioned when training Progressive GAN, guess it's a Pytorch feature of pre-allocating a lot of GPU space before adding new modules. Maybe we can try to build the entire generator in advance instead of adding blocks one by one, hopefully this would prevent the GPU memory spike problem.

majdzr commented 4 years ago

I've just tested a 256_256 image, takes 10 scales 20000 epochs, and the maximum GPU usage is around 4GB, so I presume at least 16 GB is required for 1024_1024. I've also encountered the spiking issue you mentioned when training Progressive GAN, guess it's a Pytorch feature of pre-allocating a lot of GPU space before adding new modules. Maybe we can try to build the entire generator in advance instead of adding blocks one by one, hopefully this would prevent the GPU memory spike problem.

Hey @Lotayou, I have tried 1000x800 on a 16 GB GPU. It didn't really work due to the same reason. Unfortunately, the implementation doesn't benefit from a multi GPU setup. Anyone knows why by the way?

Lotayou commented 4 years ago

@majdzr The default batch size is 1, so using multiple GPUs will not make a difference. In theory it could be possible to distribute the computations of different patches across multiple GPUs, but for the highest resolution all things just stays the same.

majdzr commented 4 years ago

@Lotayou, of course. Thanks. I totally ignored this fact. That's why have 2 GPUs linked with NVLink is not really beneficial in this case, right?

Lotayou commented 4 years ago

@majdzr Yep. We need to tune the code to make it more memory efficient. Otherwise we can just set the scale step smaller(like 0.5) to reduce the num of layers in the final network

majdzr commented 4 years ago

@Lotayou That's very exciting, have you tested any of these?

I did note that GPU memory use tends to spike between scales, and then fall back down during most of the training. I didn't look into the details but I was wondering if were possible to move whatever was spiking to CPU training and transfer back to GPU for the regular stuff. I have seen libraries that can do this in tensorflow but I don't know if it's possible in PyTorch.

Even after modifying the network (num layers and channels) to train a 1000px, I encountered the same CUDA out of memory error, although the average memory usage was only 50% for the latest scale. Anyone managed to solve the "spike" issue?

singulart commented 4 years ago

Google Colab GPUs allow to go for 1024x1024 images

JonathanFly commented 4 years ago

Google Colab GPUs allow to go for 1024x1024 images

They do? Can you post the image and exact command you used, and which GPU you had in Colab (run !nvidia-smi in a cell).

I would LOVE 1024x1024!

Ankush1909IIT commented 1 year ago

1200x900 which is equivalent to 1024x1024 is working on A100 80 GB.