Open josef-pkt opened 3 years ago
found an R package with BS, Gamma, Erlang and LN kernels while looking for Birnbaum-Saunders kernel articles https://cran.r-project.org/web/packages/DELTD/index.html
I haven't done a systematic search for functions in R yet.
browsing current nonparametric
kde code
MultivariateKDE does not assume symmetric, distance kernels K(x - xi)
, aitchison_aitken
for categorical does not use simple distance
The univariate kde and kernels defined in sandbox uses distance measure |x - xi|
and so will not work for asymmetric kernels.
That means that it should be possible to add asymmetric kernels and additional data types to MultivariateKDE (eg. "u" for unit interval and "p" for R+)
binning again
I ran my notebook with all kernels using histogram binning. With 50 bins, several kernels show spikes, gamma and beta look fine in the example. With 100 bins, all kde and kernel-cdf look good, weibull has some wiggles and might need larger bw than in my original example.
examples use nobs=1000, and the binning function, where rvs_
is the original random sample
def get_bins(rvs, bins=100):
count, edges = np.histogram(rvs, bins=bins)
center = edges[:-1] + np.diff(edges) / 2
probs = count / count.sum()
return center, probs
rvs, weights = get_bins(rvs_)
kde = kern.pdf_kernel_asym(x_plot, rvs, bw, "gamma2", weights=weights)
kce = kern.cdf_kernel_asym(x_plot, rvs, bw, "gamma2", weights=weights)
(this comment was supposed to be in the PR, but ok here)
some kernels require density at zero is zero: f(0) = 0, Those kernels cannot estimate an f(0) > 0. I didn't keep track of which kernels require that (gamma, or gamma2 does not). I'm starting to add references as I see them again
log-normal : Igarashi 2016 with changes to kernel to allow f(0) > 0 (generalized) bs: mentioned in Igarashi 2016
Gaku Igarashi (2016): Weighted log-normal kernel density estimation, Communications in Statistics - Theory and Methods, DOI: 10.1080/03610926.2014.963623
followup :
R package evmix
beta1 beta2 and gamma1, gamma2 kernel estimators, and several traditional boundary correction methods for symmetric kernels
Hu, Yang, and Carl Scarrott. 2018. “Evmix: An R Package for Extreme Value Mixture Modeling, Threshold Estimation and Boundary Corrected Kernel Density Estimation.” Journal of Statistical Software 84 (1): 1–27. https://doi.org/10.18637/jss.v084.i05.
kdensity also has gamma and beta, and a kernel based on gaussian copula by Jones and Henderson (I didn't read that article) https://cran.r-project.org/web/packages/kdensity/readme/README.html
part of #7338
beta, gamma, invgauss and recipinvgauss kernels can be obtained through sccipy's distributions, with appropriate parameterization. Birnbaum-Saunders (fatiguelife) should also be possible but I haven't tried yet.
I don't know if MultivariateKDE is setup for asymmetric kernels. In contrast to symmetric kernels, the kernel does not depend on a the distance
t - x
, but it's a lonlinear function K(t, x) where x becomes a shape parameter and bandwidth is a scale parameter.easy:
sf
of the distribution averaged over sample pointsmaybe easy
extras
The last part needs kernel specific formulas, i.e. we need kernel classes with extras. Currently, I'm working with just simple functions that compute kernel-pdf or kernel-cdf
other targets
multivariate
performance, speedups, not clear to me yet
kernels
tails
I haven't seen any references yet. But the kernels should imply different tail behaviors, e.g. can we get heavy tails? I guess we might want to choose kernels depending on behavior around x=0 boundary and on the behavior in tails.
status Currently I have mainly the scipy distribution parameterization of the kernels, which gives pdf and cdf (and maybe rvs) They work if I choose the bandwidth by visual inspection.
I would like to park those functions and leave the rest for another year.
(list of references coming later)