minyoungg / pix2latent

Code for: Transforming and Projecting Images into Class-conditional Generative Networks
https://minyoungg.github.io/pix2latent
Apache License 2.0
194 stars 14 forks source link

Nevergrad vs. HybridNevergrad #3

Closed woctezuma closed 4 years ago

woctezuma commented 4 years ago

Hello,

I want to try pix2latent on the FFHQ dataset on Google Colab. Due to RAM constraints, Colab won't run the optimization process with CMA or BasinCMA (unless I use the cars dataset), so I have to go with the faster (yet worse) option relying on Nevergrad.

I see that:

ADAM + CMA

ADAM + BasinCMA

Between the two options (Nevergrad vs. HybridNevergrad), which one would you recommend?

Edit: Below are results obtained with Nevergrad .

Target image Results with Nevergrad

Edit: Below are results obtained with HybridNevergrad.

Target image Results with HybridNevergrad

I guess I would have to try another portrait, tweak parameters, or forget Colab and stick to CMA/BasinCMA on a local machine.

minyoungg commented 4 years ago

HybridNevergrad doesn't work that well when you have a very small batch-size (e.g., < 4). I believe this is because the parallelization they provide is simply just accumulating samples before they apply update (i might be wrong) -- requiring you to increase the number of optimization steps significantly. So you can first try and see if the optimization methods using PyCMA works.

So how do we get BasinCMAOPtimizer and CMAOptimizer to work when you have GPU with restricted memory?

To reduce memory while keeping the suggested sample-size of PyCMA you can set max_batch_size to be very small. I believe for latent dimensionality of 512, PyCMA asks for 22 samples. Then setting max_batch_size to 4 will divide the 22 samples into 6 mini-batches. You can keep reducing it until it fits. Yes, this means that the optimizer will do 6 forward and backward passes for each CMA update, making the whole optimization process slow.

Also, if memory and computational time is a concern, I recommend that you use CMAOptimizer with gradient descent off instead of BasinCMA. That is because CMA does not need you to compute backward pass, and therefore effectively increasing the max_batch_size. Furthermore, there is an added benefit of not requiring you to compute the forward pass (usually improves the runtime by a factor of 2).

Hopefully, these suggestions are helpful for getting it to run on collab.

woctezuma commented 4 years ago

Thank you very much for this thorough answer! That is a lot of helpful information!

I will try to adjust max_batch_size:

https://github.com/minyoungg/pix2latent/blob/02b7fd9ddcd34dba8185fa2fb525dfe4dee40aa7/pix2latent/optimizer/base_optimizer.py#L19-L24