Open patel-zeel opened 2 years ago
Hey @patel-zeel,
Good observation! In my experience, if you can use exp
without it NaNing out, it gives faster convergence than softplus
. However, you're right that exp
doesn't work for all use cases. I see a few options:
exp
to softplus
.method
keyword which can be set to "exp"
or "softplus"
(vs.positive(method="exp")
). Make the default "exp"
."softplus"
.Since convergence tends to be quicker with exp
and exp
usually (though definitely not always) seems to be fine, I'm tending towards 2. What are your thoughts?
I agree with Point 2. It is nice to know that exp
converges faster than softplus
. I will surely try it out in my future experiments.
Can I try adding Point 2 with a PR?
Can I try adding Point 2 with a PR?
Definitely. :) That would be amazing!
Hi @wesselb,
If I am getting it right, Varz handles positivity constraints with
exp(x)
transformation which could be explosive for large values. How about adding more stable transformations for positivity such aslog(1 + exp(x))
(softplus)?For GP models,
log(1 + exp(x))
seems popular and more stable as per the following references.gpytorch.constraints.Positive