panlanfeng / KernelEstimator.jl

The julia package for nonparametric density estimate and regression
Other
26 stars 6 forks source link
density-estimation kernel-density nonparametric-regression nonparametrics

KernelEstimator

Linux: Build Status

The Julia package for nonparametric kernel density estimate and regression. This package currently includes univariate kernel density estimate, local constant regression (Nadaraya-Watson regression) and local linear regression. It can also compute the Bootstrap confidence band [4].

This package provides Gamma and Beta kernel to deal with bounded density estimation and regression. These two kernels are free of boundary bias for one side and two sides bounded data respectively, see [2, 3]. In particular, this package provide least square cross validation (LSCV) bandwidth selection functions for Gamma and Beta kernels.

Bandwidth selection is critical in kernel estimation. LSCV is always recommended. Likelihood cross validation is provided but should be avoided because of known drawbacks. For regression problem, the bandwidth of local constant regression is selected using LSCV while that for local linear regression is chosen by AIC [6].

To install and use this package in Julia,

Pkg.add("KernelEstimator")
using KernelEstimator

See usage under examples/ directory.

This package calculate densities via direct approach, i.e. adding kernel functions together. To define new kernel, you need to define a new function takes the same arguments as gaussiankernel and output the kernel weights at given point. If no bandwidth selection function is provided, lscv with numeric integration will be used by default.

Functions

This package provides two major functions, kerneldensity for kernel density estimation and npr for nonparametric regression. For kernel density, you can simply use

xdata = randn(1000)
kerneldensity(xdata)

or specify some options

xeval = linspace(-3, 3, 100)
bw = bwlscv(xdata, gaussiankernel)
kerneldensity(xdata, xeval=xeval, lb=-Inf, ub=Inf, kernel=gaussiankernel,h = bw)

xeval specifies the position you want to evaluate the density at. Default to be the same as xdata. lb and ub means lower bound and upper bound of the data. If you specify either of them to be some finite value, user choice of kernel function will be suppressed and gammakernel will be used with a warning. If you specify both, betakernel is used with a warning if user's choice is different.

For kernel regression, you can use

x = rand(Beta(4,2), 500) * 10
y=2 .* x.^2 + x .* rand(Normal(0, 5), 500)
npr(x, y)

or change the default by

npr(x, y, xeval=x, reg=locallinear, kernel=betakernel,lb=0.0, ub=10.0)

reg specifies the order of local polynomial regression. You can choose localconstant, local constant regression or locallinear, local linear regression. locallinear has better theoretical properties in prediction y and is used by default but is more computing intensive.

There is also a function computing the bootstrap confidence interval for regression.

bootstrapCB(x, y; xeval=x, B=500, reg=locallinear, lb=-Inf, ub=Inf, kernel=gaussiankernel)

B specifies the number of bootstrap sampling.

The following functions are also provided:

and bandwidth selection functions:

The meaning of arguments:

Demos

Reference