linkedin / photon-ml

A scalable machine learning library on Apache Spark
Other
790 stars 185 forks source link

Remove mean and variance type consistency check #463

Closed yunboouyang closed 4 years ago

yunboouyang commented 4 years ago

Coefficients case class requires means and variances should be of the same type (both are SparseVector or DenseVector). However this requirement is not needed.

cmjiang commented 4 years ago

LGTM.

cmjiang commented 4 years ago

Can you check the reason of the observation of mean and variance type inconsistency?

yunboouyang commented 4 years ago

Can you check the reason of the observation of mean and variance type inconsistency?

In some rare situation the optimizer (LBFGS / TRON) produces all-zero coefficients (represented as a SparseVector) but variances are represented as a DenseVector. Adding this check will throw an unexpected exception.

The true reason of outputing all-zero coefficients is if the initial coefficients (all-zero coefficients) from which optimization starts are very poor, LBFGS will not be able to identify a good step-size, or TRON will not be able to solve the trust-region sub-problem. If this occurs, Photon ML will assume that the problem is unsolvable and return the initial coefficients.