stanfordmlgroup / ngboost

Natural Gradient Boosting for Probabilistic Prediction
Apache License 2.0
1.64k stars 215 forks source link

Multivariate Normal Implementation #235

Closed MikeOMa closed 3 years ago

MikeOMa commented 3 years ago

Currently working on adding a multivariate normal distribution with LogScore. Doing it according to this parametrization https://ieeexplore.ieee.org/document/6797083 where the lower triangular of the inverse covariance matrix is predicted. This way the only constraint is positive diagonals.

I think the only changes I will need to make is to the internals of ngboost.py is to add a multi_output argument to check_X_y.

Only question I have for now is how useful is the current MultivariateNormal implementation? I could not get it to work or salvage much from when I started working on this. It is also seems that the current implementation does not force the diagonals of L to be positive which might result in a non positive semi definite matrix (I think?).

Should I a) Make another distn class called MultivariateNormal_Regression b) Override the current MultivariateNormal

alejandroschuler commented 3 years ago

Feel free to override the current class, I'm not aware of anyone that doing any experimental work with it but we can always retrieve it from the commit history if we need to.

MikeOMa commented 3 years ago

https://github.com/stanfordmlgroup/ngboost/issues/228#issuecomment-782196338

Suffering from the same issue I commented about above^ If I make a "class factory" like the k_categorical function the model using the distribution (or the distribution itself) won't pickle.

Multivariate normal works for k>2 but it means the resulting model fit can no longer be pickled.

For the moment I am just going to fix k=2 and not use a class factory. In reality using it for anything higher results in many parameters [k(k+3)/2]. Maybe k=3 would work but I doubt it, that would be 9 parameters to fit.

alejandroschuler commented 3 years ago

@tonyduan @avati hey, any chance either of you knows of a closed-form CRPS for the multivariate normal?

avati commented 3 years ago

I'm not aware of a closed form CRPS expression for multivariate gaussian. AFAIK the common approach in meteorology is to start with the "energy score" representation of CRPS (see eqn 22 of https://viterbi-web.usc.edu/~shaddin/cs699fa17/docs/GR07.pdf with Beta=1). This form generalizes to multi dimensions because of the Expectation, which is approximated by samples or MC, bypassing hairy double/tripe/quadruple integrals. There might be a similar trick to get a monte-carlo gradient estimate, though haven't thought much about it.