Open soumyasahu opened 3 years ago
The math and implementation look good to me. @ryan-wolbeck do you know what's going on with the failing checks?
@soumyasahu can you take a look at fixing the following
***** Module ngboost.distns.mixture_normal ngboost/distns/mixture_normal.py:3:0: C0414: Import alias does not rename original package (useless-import-alias) ngboost/distns/mixture_normal.py:50:57: E0602: Undefined variable 'mixprop' (undefined-variable) ngboost/distns/mixture_normal.py:55:78: E1130: bad operand type for unary -: NoneType (invalid-unary-operand-type) ngboost/distns/mixture_normal.py:55:78: E1130: bad operand type for unary -: str (invalid-unary-operand-type) ngboost/distns/mixture_normal.py:66:8: W0231: init method from base class 'RegressionDistn' is not called (super-init-not-called) ngboost/distns/mixture_normal.py:111:12: W0612: Unused variable 'n' (unused-variable) ngboost/distns/mixture_normal.py:3:0: W0611: Unused math imported as math (unused-import) ngboost/distns/mixture_normal.py:6:0: W0611: Unused pandas imported as pd (unused-import) ngboost/distns/mixture_normal.py:7:0: W0611: Unused import scipy (unused-import) ngboost/distns/mixture_normal.py:8:0: W0611: Unused laplace imported from scipy.stats as dist (unused-import)
I think I have fixed other issuses apart from the followings:
ngboost/distns/mixture_normal.py:3:0: C0414: Import alias does not rename original package (useless-import-alias) ngboost/distns/mixture_normal.py:55:78: E1130: bad operand type for unary -: NoneType (invalid-unary-operand-type) ngboost/distns/mixture_normal.py:55:78: E1130: bad operand type for unary -: str (invalid-unary-operand-type) ngboost/distns/mixture_normal.py:66:8: W0231: init method from base class 'RegressionDistn' is not called (super-init-not-called)
Actually, I don't understand these issues. One problem may be -- 'RegressionDistn' is inside the function k_normal_mixture.
I have coded the log score and the derivatives based on the attached derivations. Implementation_of_Mixture_Normal_Density_in_NGBoost.pdf
To map the mixture proportions I have used multivariate logit transformation. The inverse of the Jacobian of this transformation is required to find 'd_score'. This can be calculated in a closed-form in the following way, Inv_jaccobian.pdf
The exact Fisher information matrix can be calculated but the expressions of double derivatives will be ugly. I shall give it a try later.
For initial values, K-means clustering has been used where sample proportions, means, and variances from each cluster are considered as mixture proportions, mean, and variance of each normal distribution.