wesselb / varz

Painless optimisation of constrained variables in AutoGrad, TensorFlow, PyTorch, and JAX
MIT License
23 stars 3 forks source link

Stability of positivity constraint #2

Open patel-zeel opened 3 years ago

patel-zeel commented 3 years ago

Hi @wesselb,

If I am getting it right, Varz handles positivity constraints with exp(x) transformation which could be explosive for large values. How about adding more stable transformations for positivity such as log(1 + exp(x)) (softplus)?

For GP models, log(1 + exp(x)) seems popular and more stable as per the following references.

  1. GPy, paramz
  2. GPFlow positive bijector
  3. gpytorch.constraints.Positive
wesselb commented 3 years ago

Hey @patel-zeel,

Good observation! In my experience, if you can use exp without it NaNing out, it gives faster convergence than softplus. However, you're right that exp doesn't work for all use cases. I see a few options:

  1. Change exp to softplus.
  2. Add a method keyword which can be set to "exp" or "softplus" (vs.positive(method="exp")). Make the default "exp".
  3. Do 2, but make the default "softplus".

Since convergence tends to be quicker with exp and exp usually (though definitely not always) seems to be fine, I'm tending towards 2. What are your thoughts?

patel-zeel commented 3 years ago

I agree with Point 2. It is nice to know that exp converges faster than softplus. I will surely try it out in my future experiments.

Can I try adding Point 2 with a PR?

wesselb commented 3 years ago

Can I try adding Point 2 with a PR?

Definitely. :) That would be amazing!