Nevergrad vs. HybridNevergrad

woctezuma commented 4 years ago

Hello,

I want to try pix2latent on the FFHQ dataset on Google Colab. Due to RAM constraints, Colab won't run the optimization process with CMA or BasinCMA (unless I use the cars dataset), so I have to go with the faster (yet worse) option relying on Nevergrad.

I see that:

Nevergrad is gradient-free optimization (CMA by default), followed by ADAM fine-tuning, so that would be similar to:

ADAM + CMA

HybridNevergrad alternates gradient-free optimization (CMA by default) and SGD optimization. That would be akin to the following, albeit with SGD instead of ADAM:

ADAM + BasinCMA

Between the two options (Nevergrad vs. HybridNevergrad), which one would you recommend?

Edit: Below are results obtained with Nevergrad .

Target image Results with Nevergrad

Edit: Below are results obtained with HybridNevergrad.

Target image Results with HybridNevergrad

I guess I would have to try another portrait, tweak parameters, or forget Colab and stick to CMA/BasinCMA on a local machine.

minyoungg commented 4 years ago

HybridNevergrad doesn't work that well when you have a very small batch-size (e.g., < 4). I believe this is because the parallelization they provide is simply just accumulating samples before they apply update (i might be wrong) -- requiring you to increase the number of optimization steps significantly. So you can first try and see if the optimization methods using PyCMA works.

So how do we get BasinCMAOPtimizer and CMAOptimizer to work when you have GPU with restricted memory?

To reduce memory while keeping the suggested sample-size of PyCMA you can set max_batch_size to be very small. I believe for latent dimensionality of 512, PyCMA asks for 22 samples. Then setting max_batch_size to 4 will divide the 22 samples into 6 mini-batches. You can keep reducing it until it fits. Yes, this means that the optimizer will do 6 forward and backward passes for each CMA update, making the whole optimization process slow.

Also, if memory and computational time is a concern, I recommend that you use CMAOptimizer with gradient descent off instead of BasinCMA. That is because CMA does not need you to compute backward pass, and therefore effectively increasing the max_batch_size. Furthermore, there is an added benefit of not requiring you to compute the forward pass (usually improves the runtime by a factor of 2).

Hopefully, these suggestions are helpful for getting it to run on collab.

woctezuma commented 4 years ago

Thank you very much for this thorough answer! That is a lot of helpful information!

I will try to adjust max_batch_size:

https://github.com/minyoungg/pix2latent/blob/02b7fd9ddcd34dba8185fa2fb525dfe4dee40aa7/pix2latent/optimizer/base_optimizer.py#L19-L24

minyoungg / pix2latent

Nevergrad vs. HybridNevergrad #3