Open frechette-alex opened 6 years ago
I am also not familiar with how autodifferentiation works with JuMP, but a similar numerical issue may also affect the calculation of the gradient, which contains normalized exponential weights.
My own scribbled calculations:
Note that a constant amount can be removed from the Lks (i.e. max Lks) to avoid underflow/overflow.
Somebody explaining this better than me probably: https://timvieira.github.io/blog/post/2014/02/11/exp-normalize-trick/.
I can see how these could be issues, however we have generally not observed this in practice on the datasets that we use. Our experience is that IPOPT by is numerically robust by default, and compiling with the HLS linear algebra library brings additional numerical benefits.
I would guess that in practice we are hitting a limit on the number of samples that we put into the algorithm before we start to hit the floating point limits.
If you would like to experiment with such a formulation please feel free to Fork this repo and we can test your Fork internally and consider a PR if things work better on our side.
Can you elaborate on how you monitored this in practice? There is a strong chance that the underflow/overflow is not caught and silently affects the numerical computations.
Also I am not sure the numerical robustness of IPOPT or other librairies will help you: the numerical issues I am describing are pathological to chaining specific operators together, and I'd be surprised if any library addresses that. (It would be hard for the library to detect the usage of these operators).
I have reimplemented your method in python (still using IPOPT), addressing these numerical issues manually (due to not having access to autodiff
). In my comparison tests between both implementations, I've looked at highly degenerate problems (e.g. uniform [-1,1] Ising models, p=6, ~ 10000 samples) where the low temperature (or large scaling, beta) makes it so that there is really only a single unique sample (seen many times). In these cases, the Julia implementation seems to find a suboptimal reconstruction (in terms of the logRISE + l1 penalty objective). I have to admit these problems are very artificial and possibly not interesting in practice, but I am not convinced they are the only ones causing these numerical issues.
I'll try to make a fork of this repo and fix the issue (despite my unfamiliarity with Julia :-) ).
If IPOPT's algorithm and/or Julia's AD has under/overflow issues, I would expect it to fail entirely or at least report warnings.
In your specific example, did you try turning up the numerical tolerances for feasibility and optimality in Ipopt? These settings are quite important to ensure optimality in challenging numerical cases.
I think you are right. I have re-looked at my logs for the degenerate problems, and for the Julia implementation I mostly get
WARNING: Ipopt finished with status Restoration_Failed
WARNING: Not solved to optimality, status: Error
(Which I don't get with my python implementation.) I'd have to dig deeper to see what causes that in IPOPT, but I'd guess it is the numerical issue.
Since IPOPT complains, I'd say its less of a serious issue (although it requires the user monitors his/her logs).
I've played with the tolerance for the Julia implementation, and did not different behaviors (from 1E-20, to 1E-1).
Good point. In this case the software can respond more reasonably. We should be checking the solver status on each iteration (see #6).
If the tolerance is does not fix the issue, the next thing to try would be to compile Ipopt with the HSL linear algebra library; but if you already have an optimized python implementation I'd stick with that.
In any case, improved formulations are always welcome!
https://github.com/lanl-ansi/inverse_ising/blob/42291444bd9ac935c958de7a3e6a356fa9886464/Inverse_Ising.jl#L84
High risk of underflow/overflow from chaining logs, sums and exponentials. I don't know the precise effects of underflow/overflow with Ipopt, I suspect it could lead to premature termination, especially in highly degenerate problems.
I suggest using traditional "subtract max trick" (see https://en.wikipedia.org/wiki/LogSumExp), or an already implemented
logsumexp
function, e.g.logsumexp
in JuliaStats/StatsFuns - https://github.com/JuliaStats/StatsFuns.jl/blob/master/src/basicfuns.jl#L183)