ihmeuw / risk_distributions

BSD 3-Clause "New" or "Revised" License
0 stars 3 forks source link

`x_min` and `x_max` can be far too conservative #61

Open zmbc opened 1 month ago

zmbc commented 1 month ago

My understanding is that x_min and x_max, for the non-mirrored distributions, act only as "guardrails" against trying to compute something that we don't have precision to compute. They are approximated using a lognormal distribution, but this approximation can be quite bad. Then the "guardrails" become far too restrictive and prevent the user from computing something that they can totally compute with their precision.

MCVE:

>>> import risk_distributions
>>> g = risk_distributions.risk_distributions.Gamma(risk_distributions.risk_distributions.Gamma.get_parameters(mean=100_017, sd=100_000.7))
>>> g.cdf(g.parameters.x_min)
0    0.052517
dtype: float64
>>> g = risk_distributions.risk_distributions.Gamma(risk_distributions.risk_distributions.Gamma.get_parameters(mean=100_017, sd=500_000.7))
>>> g.cdf(g.parameters.x_min)
0    0.673451
dtype: float64
NathanielBlairStahn commented 1 month ago

@zmbc I've never understood what x_min and x_max were for -- could you explain further? Also, I don't understand what exactly your example is showing, except that it seems like the CDF at a value we're calling "x min" shouldn't be as high as 67% or even 5%. But as I said, I don't know what x_min and x_max are actually for, so I don't know what this means.

NathanielBlairStahn commented 1 month ago

Seems like this note from @alibow in our documentation of enesemble distributions is relevant. And the linked notebook.

zmbc commented 1 month ago

it seems like the CDF at a value we're calling "x min" shouldn't be as high as 67% or even 5%

Yep that's it!

x_min and x_max are supposed to represent the support of the distribution in some numerical, computable sense. They have additional effects on mirrored distributions, which are what those notes and notebooks are about. In the case of mirrored distributions, it is important to match GBD, so the auto-generated x_min and x_max from this package should be used with caution!

zmbc commented 1 month ago

https://jira.ihme.washington.edu/browse/MIC-5167

NathanielBlairStahn commented 1 month ago

They are approximated using a normal distribution

Do you mean lognormal?

zmbc commented 1 month ago

Oops, yes!