Open orlandothoeny opened 2 years ago
Hi @orlandothoeny,
Thanks for your interest in MathPHP.
Thanks for the suggestion for a feature improvement for a new kernel function. We'll look into it and see if this is something we can add.
In the meantime, you are able add your own custom kernel function by supplying a PHP callable to the setKernelFunction
method of a KernelDensityEstimation
object.
Mark
@markrogoyski I believe this request is referring to the bandwidth, not the kernel. Currently the object accepts a float or null. If null, a default bandwidth is calculated and used.
@orlandothoeny if you are able to implement the calculation, we could easily add it add a static method, such that a user could call something like:
$bandwidth = KernelDensityEstimation::ISJBandwidth($data);
$kde->setBandwidth($bandwidth);
this would be the most backward-compatible strategy.
@Beakerboy Yes, that's correct. This would be an additional method for calculating the bandwidth.
That would be one option regarding backward compatibility, another option would be to allow callable
s as an additional type for the $bandwith
parameter. But the option you described is probably simpler.
I'd have to brush up on my math a bit to implement it myself, a few years have passed since I last used that stuff :) Not sure if I have the time to do that though.
I understand that it's an open-source project, so no pressure on you guys. It's your free time. But if someone wants to implement it, I'm grateful.
@orlandothoeny,
What could help speed up an implementation is providing test data to write unit tests against.
For example:
Having data to write unit tests allows us to be confident we are building the write calculation.
Another option is to research and provide instructions on how to produce test data using a trustworthy tool like R or NumPy for instance.
The
KernelDensityEstimation
class currently includes the normal distribution approximation bandwidth estimator (seeKernelDensityEstimation::getDefaultBandwith()
) when no bandwidth is passed to the constructor.It would be useful to have the possibility to choose the Improved Sheather-Jones algorithm as the bandwidth function. Especially when working with non-normal-distributed datasets.
Some resources about Sheather-Jones :