haifengl / smile

Statistical Machine Intelligence & Learning Engine
https://haifengl.github.io
Other
6.02k stars 1.13k forks source link

No warning when inferred bandwidth is zero #661

Closed sjenkins20 closed 3 years ago

sjenkins20 commented 3 years ago

Describe the bug We supplied our data to the constructor without a bandwidth. We were aware that the data would create a very narrow distribution, something close to a dirac-delta function. When we did this, the probability of the inferred distribution, was zero at the location of the supplied samples. Exploring this, we found that the inferred bandwidth was zero, creating this problem.

Expected behavior An error message either telling the user that the data points are too similar (this library can't create representations of dirac-delta functions) or that the calculated bandwidth is zero. An extra check might be for the KDE to verify that it is a valid pdf, having an area under the curve of 1, or at the very least a non-zero area under the curve.

Actual behavior The inferred pdf had mean 967.8 (1dp), sd 0.8 (1dp) and a non-zero variance, but evaludating the pdf at 968 produced zero.

Code snippet final KernelDensity kd = new KernelDensity(rawData.stream().mapToDouble(d -> d).toArray());

Input data final List<Double> rawData contained [968.0, 968.0, 968.0, 968.0, 968.0, 968.0, 968.0, 968.0, 968.0, 968.0, 968.0, 968.0, 968.0, 968.0, 968.0, 968.0, 968.0, 968.0, 968.0, 968.0, 968.0, 968.0, 968.0, 967.0, 968.0, 968.0, 962.0, 968.0, 968.0, 968.0, 968.0, 968.0, 968.0, 968.0, 968.0, 968.0, 968.0, 968.0, 968.0, 968.0, 968.0, 968.0, 968.0, 968.0, 967.0, 968.0, 968.0, 968.0, 968.0, 968.0]

Additional context

haifengl commented 3 years ago

Fixed. Please try master branch. Thanks.

sjenkins20 commented 3 years ago

Will you be updating the maven version with this change? We pull in the library through maven.

haifengl commented 3 years ago

We have our release schedule. Generally no new version for a small fix.