Closed wodsoe closed 2 years ago
Should work (up to your GPU memory).
If you can get 1024px output please update this with what hardware you did it on. I can't get much over 600px with a 16GB GPU. I would love to implement some form gradient checkpointing, I really don't care how long it takes to train if I can get over 1000+ pixel output.
This size didn't work for my GPU (1080ti with 11GB VRAM).
Any tips on how to train up to 1000px on a 4 GPU machine? I tried wrapping everything in nn.DataParallel without any luck. Thanks!
@JonathanFly @majdzr @singulart Guys, I think there are two places we can tune down the memory usage. These would compromise the performance, but at least it's better than getting stuck.
@Lotayou That's very exciting, have you tested any of these?
I did note that GPU memory use tends to spike between scales, and then fall back down during most of the training. I didn't look into the details but I was wondering if were possible to move whatever was spiking to CPU training and transfer back to GPU for the regular stuff. I have seen libraries that can do this in tensorflow but I don't know if it's possible in PyTorch.
I've just tested a 256256 image, takes 10 scales 20000 epochs, and the maximum GPU usage is around 4GB, so I presume at least 16 GB is required for 10241024. I've also encountered the spiking issue you mentioned when training Progressive GAN, guess it's a Pytorch feature of pre-allocating a lot of GPU space before adding new modules. Maybe we can try to build the entire generator in advance instead of adding blocks one by one, hopefully this would prevent the GPU memory spike problem.
I've just tested a 256_256 image, takes 10 scales 20000 epochs, and the maximum GPU usage is around 4GB, so I presume at least 16 GB is required for 1024_1024. I've also encountered the spiking issue you mentioned when training Progressive GAN, guess it's a Pytorch feature of pre-allocating a lot of GPU space before adding new modules. Maybe we can try to build the entire generator in advance instead of adding blocks one by one, hopefully this would prevent the GPU memory spike problem.
Hey @Lotayou, I have tried 1000x800 on a 16 GB GPU. It didn't really work due to the same reason. Unfortunately, the implementation doesn't benefit from a multi GPU setup. Anyone knows why by the way?
@majdzr The default batch size is 1, so using multiple GPUs will not make a difference. In theory it could be possible to distribute the computations of different patches across multiple GPUs, but for the highest resolution all things just stays the same.
@Lotayou, of course. Thanks. I totally ignored this fact. That's why have 2 GPUs linked with NVLink is not really beneficial in this case, right?
@majdzr Yep. We need to tune the code to make it more memory efficient. Otherwise we can just set the scale step smaller(like 0.5) to reduce the num of layers in the final network
@Lotayou That's very exciting, have you tested any of these?
I did note that GPU memory use tends to spike between scales, and then fall back down during most of the training. I didn't look into the details but I was wondering if were possible to move whatever was spiking to CPU training and transfer back to GPU for the regular stuff. I have seen libraries that can do this in tensorflow but I don't know if it's possible in PyTorch.
Even after modifying the network (num layers and channels) to train a 1000px, I encountered the same CUDA out of memory error, although the average memory usage was only 50% for the latest scale. Anyone managed to solve the "spike" issue?
Google Colab GPUs allow to go for 1024x1024 images
Google Colab GPUs allow to go for 1024x1024 images
They do? Can you post the image and exact command you used, and which GPU you had in Colab (run !nvidia-smi in a cell).
I would LOVE 1024x1024!
1200x900 which is equivalent to 1024x1024 is working on A100 80 GB.
Does it support 1024px? let --max_size=1024?