stanfordmlgroup / ngboost

Natural Gradient Boosting for Probabilistic Prediction
Apache License 2.0
1.63k stars 214 forks source link

Adding distributions and log scores for K-Normal-Mixture #265

Open soumyasahu opened 3 years ago

soumyasahu commented 3 years ago

I have coded the log score and the derivatives based on the attached derivations. Implementation_of_Mixture_Normal_Density_in_NGBoost.pdf

To map the mixture proportions I have used multivariate logit transformation. The inverse of the Jacobian of this transformation is required to find 'd_score'. This can be calculated in a closed-form in the following way, Inv_jaccobian.pdf

The exact Fisher information matrix can be calculated but the expressions of double derivatives will be ugly. I shall give it a try later.

For initial values, K-means clustering has been used where sample proportions, means, and variances from each cluster are considered as mixture proportions, mean, and variance of each normal distribution.

alejandroschuler commented 3 years ago

The math and implementation look good to me. @ryan-wolbeck do you know what's going on with the failing checks?

ryan-wolbeck commented 3 years ago

@soumyasahu can you take a look at fixing the following

***** Module ngboost.distns.mixture_normal ngboost/distns/mixture_normal.py:3:0: C0414: Import alias does not rename original package (useless-import-alias) ngboost/distns/mixture_normal.py:50:57: E0602: Undefined variable 'mixprop' (undefined-variable) ngboost/distns/mixture_normal.py:55:78: E1130: bad operand type for unary -: NoneType (invalid-unary-operand-type) ngboost/distns/mixture_normal.py:55:78: E1130: bad operand type for unary -: str (invalid-unary-operand-type) ngboost/distns/mixture_normal.py:66:8: W0231: init method from base class 'RegressionDistn' is not called (super-init-not-called) ngboost/distns/mixture_normal.py:111:12: W0612: Unused variable 'n' (unused-variable) ngboost/distns/mixture_normal.py:3:0: W0611: Unused math imported as math (unused-import) ngboost/distns/mixture_normal.py:6:0: W0611: Unused pandas imported as pd (unused-import) ngboost/distns/mixture_normal.py:7:0: W0611: Unused import scipy (unused-import) ngboost/distns/mixture_normal.py:8:0: W0611: Unused laplace imported from scipy.stats as dist (unused-import)

soumyasahu commented 3 years ago

I think I have fixed other issuses apart from the followings:

ngboost/distns/mixture_normal.py:3:0: C0414: Import alias does not rename original package (useless-import-alias) ngboost/distns/mixture_normal.py:55:78: E1130: bad operand type for unary -: NoneType (invalid-unary-operand-type) ngboost/distns/mixture_normal.py:55:78: E1130: bad operand type for unary -: str (invalid-unary-operand-type) ngboost/distns/mixture_normal.py:66:8: W0231: init method from base class 'RegressionDistn' is not called (super-init-not-called)

Actually, I don't understand these issues. One problem may be -- 'RegressionDistn' is inside the function k_normal_mixture.