statsmodels / statsmodels

Statsmodels: statistical modeling and econometrics in Python
http://www.statsmodels.org/devel/
BSD 3-Clause "New" or "Revised" License
9.79k stars 2.85k forks source link

ENH: distribution derivatives #7351

Open josef-pkt opened 3 years ago

josef-pkt commented 3 years ago

We have distribution F(x), density f(x) and two derivatives f'(x) and f''(x) We need those to compute bias, variance, mean squared error and optimal bandwidth in kernel distribution or density estimation e.g. #7346

Some of this computation should be reusable components,

e.g. loc scale family z = (x - m) / s

F(x|m, s) = F(z | 0, 1) f(x|m, s) = 1 / s f(z | 0, 1) f'(x| .) = 1 / s**2 f''(z|0, 1)

but also d f'(z) / d m = df /dz * dz/dm = - 1 / s (f'(z|0, 1) (reuse f' )

similarly for distribution on R+ with only scale and loc is fixed at zero

shape parameters are extra and cannot be handled like m, s using chain rule in z

other usage in MLE we need d logf /d m, d logf / d s as score functions, which can also reuse f' = df / dz i.e. can be used as part of score factor

Note f' is deriv2_inverse in CDFLink.

(I don't remember if I also needed it for influence measures.)

for asymmetric kernels: Those are themselves distributions on R+. If we have the derivatives of the distributions, then we should also be able to get the derivatives of the kernels and kernel densities.

extension: distribution based on transformation e.g. Bernbaum-Saunders f' should be composed of derivative of transformation and derivative of base distribution, see #7246

I guess there are not many reusable components in discrete distributions because they are not scale, or loc-scale families (maybe in exponential family with canonical link)

josef-pkt commented 3 years ago

aside: genmod families do not have any derivatives Estimation only depends on families through variance function and links, and derivatives, score, hessian, are computed in GLM class