linkedin / photon-ml

A scalable machine learning library on Apache Spark
Other
793 stars 185 forks source link

Enable variance computation #128

Open ashelkovnykov opened 8 years ago

ashelkovnykov commented 8 years ago

Currently the code for computing coefficient variances in addition to means exists but is statically disabled. We'll need to make variance computation a configurable option for GAME, as well as add unit and integration tests.

ashelkovnykov commented 8 years ago

PR #141 enabled variance computation and included basic unit tests. However, more comprehensive integration tests still need to be added.

fastier-li commented 7 years ago

This requires fixing the Hessian diagonal under normalization. Without that, the variances will not be correct if normalization is used.

ashelkovnykov commented 6 years ago

PR #141 Initially enabled a basic variance computation that involved using the Hessian diagonal as an approximation instead of the Hessian matrix. This solution was fast, especially since the Hessian diagonal was in some cases (using the TRON optimizer) already available. However, it inaccurate, sometimes incredibly so.

PR #349 replaced this computation with one that computes the entire Hessian matrix and inverts it to calculate the variance values. This method is much more accurate, however it also sets a limit with regards to how many features can be present for the variance computation, as the matrix inversion is incredibly slow for large numbers of features.

The true variance computation would involve computing the covariance between fixed and random effect features as well, which is entirely unreasonable.

Likely, we should bring back the solution from PR #141 which was replace by PR #349 and make it an option for variance computation (e.g. NONE, SIMPLE, FULL).

ashelkovnykov commented 6 years ago

@joshvfleming