Probelms with the output for regression with LogNormal

maximilianpfau commented 4 years ago

Dear @alejandroschuler !

Thank you for the ngboost and you time earlier today.

Unfortunately, the output of regression with a LogNormal distribution appears to be incorrect.

https://colab.research.google.com/drive/1qr1l6h0NrD5PGYfF-FtiTnojb_FKEFAZ?usp=sharing

Specifically: Error 1: Retrival of the shape parameter "s" does not produce any output (cf. below) Error 2: Retrival of the scale parameter "scale" produces the wrong values (i.e., the "s" value from Y_dists.params) Error 3: The expression "np.exp(Y_dists.loc)" produces the "scale" values from Y_dists.params

Thank you for your input in advance, many thanks, Maximilian

alejandroschuler commented 4 years ago

Thanks for the question and the clear demonstration of the issue!

I looked into this and it turns out this is actually the intended behavior! The problem is that you are accessing the parameter values directly from the predicted distributions instead of from the provided dictionary. In other words, if you want the scale parameter s for the first 10 observations you should do Y_dists[0:10].params['s'], not Y_dists[1:10].s. The class properties that encode the parameters are internal to those classes and should not be accessed by the user.

This is clearly a little confusing, so I do think this points towards doing one or two things:

rename all of the internal parameters in all distributions with a leading underscore (e.g. you would have to type Y_dists._scale`) to make it clear (by python convention) that these are supposed to be internal and not accessed by the user
consider using a sigma, mu parametrization of the lognormal, instead of the s = sigma, scale = exp(mu) parametrization that scipy.stats uses. This seems to be more commonly understood?

@ryan-wolbeck @avati @tonyduan Thoughts?

ryan-wolbeck commented 4 years ago

@alejandroschuler I really like the idea of doing your first suggestion, it would then follow PEP 8. I don't really have an opinion on the second piece.

maximilianpfau commented 4 years ago

Thank you for the help!

alejandroschuler commented 4 years ago

@ryan-wolbeck are you interested in taking this on?

ryan-wolbeck commented 4 years ago

@alejandroschuler yeah I can do this, not sure when I'll get to it but I'll put it on the backlog

tonyduan commented 4 years ago

Thank you @ryan-wolbeck!

ggrrll commented 3 years ago

I also think that there is right now some confusion ... in particular, to clarifiy maybe what is was said by @maximilianpfau

.params['scale']!= .scale ,

which is indeed a bit annoying (one has to figure that out by printing them )

Other than that, I also tested that

.dist.median ( = exp(.loc) )
.dist.mean ( = exp( .loc + .params['s'] **2 / 2 )
(similarly for the variance)

are indeed correct

stanfordmlgroup / ngboost

Probelms with the output for regression with LogNormal #168