jasondavies / science.js

Scientific and statistical computing in JavaScript.
http://www.jasondavies.com/
Other
886 stars 93 forks source link

.stats.kde() doesn't like multiple repeated values #29

Open jamtholee opened 5 years ago

jamtholee commented 5 years ago

I'm not sure whether this is a limitation of JS, or whether it's a bug in science.js, but when I give kde a dataset with multiple repeating values, such as:

[394, 0, 393, 1271, 0, 640, 20, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1159, 969, 2891, 9, 1425, 0, 9, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 8, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1071, 0, 592, 998, 1384, 0, 21, 1711, 341, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 141, 692, 0, 0, 0, 0, 0, 0, 0, 0, 651, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 901, 0, 0, 0, 0, 7, 0, 0, 0, 7, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 584, 0, 818, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 41, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]

or

[68.93749237060547, 84.54015350341797, 127.1878890991211, 68.9375, 174.15017700195312, 68.9375, 68.9375, 68.9375, 68.9375, 68.9375, 68.9375, 68.9375, 68.9375, 68.9375, 68.9375, 68.9375, 68.9375, 68.9375, 147.9225311279297, 52.56502151489258, 203.442626953125, 83.44722747802734, 139.95753479003906, 68.9375, 68.9375, 68.9375, 68.9375, 68.9375, 68.9375, 68.9375, 68.93749237060547, 68.93749237060547, 68.93749237060547, 68.9375, 68.9375, 68.93749237060547, 68.93749237060547, 68.93749237060547, 68.93749237060547, 68.9375, 68.9375, 68.93749237060547, 68.93749237060547, 68.93749237060547, 68.9375, 68.93749237060547, 68.93749237060547, 68.9375, 68.9375, 68.9375, 68.9375, 68.9375, 68.93749237060547, 68.9375, 68.9375, 68.9375, 68.93749237060547, 68.9375, 68.9375, 68.9375, 68.9375, 68.9375, 68.9375, 68.9375, 160.83853149414062, 68.9375, 101.62258911132812, 112.3798828125, 124.74468231201172, 68.9375, 68.9375, 186.23104858398438, 129.73406982421875, 68.9375, 68.93749237060547, 68.9375, 68.9375, 68.9375, 68.9375, 68.9375, 68.9375, 68.9375, 68.9375, 68.93749237060547, 68.93749237060547, 68.93749237060547, 68.93749237060547, 74.37715911865234, 109.83863830566406, 68.93749237060547, 68.93749237060547, 68.93749237060547, 68.93749237060547, 68.93749237060547, 68.9375, 68.9375, 68.9375, 91.8897476196289, 68.9375, 68.9375, 68.9375, 68.9375, 68.9375, 68.9375, 68.9375, 68.9375, 68.9375, 68.9375, 68.9375, 68.9375, 68.9375, 119.34972381591797, 68.9375, 68.9375, 68.9375, 68.9375, 68.9375, 68.9375, 68.93749237060547, 68.93749237060547, 68.93749237060547, 68.93749237060547, 68.93749237060547, 68.93749237060547, 68.93749237060547, 68.93749237060547, 68.93749237060547, 68.93749237060547, 68.93749237060547, 68.93749237060547, 68.93749237060547, 68.93749237060547, 68.93749237060547, 68.93749237060547, 68.93749237060547, 68.93749237060547, 68.93749237060547, 68.93749237060547, 68.93749237060547, 68.93749237060547, 68.93749237060547, 68.93749237060547, 68.93749237060547, 68.93749237060547, 68.93749237060547, 68.93749237060547, 68.93749237060547, 68.93749237060547, 68.93749237060547, 68.93749237060547, 68.93749237060547, 68.93749237060547, 68.93749237060547, 68.93749237060547, 68.93749237060547, 68.93749237060547, 68.93749237060547, 68.93749237060547, 68.93749237060547, 68.93749237060547, 68.93749237060547, 68.93749237060547, 68.93749237060547, 68.93749237060547, 68.93749237060547, 68.93749237060547, 68.93749237060547, 57.34675216674805, 68.9375]

or

[1, 2, 3, 4, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5]

I either get NaNs or 0s when evaluating the probability at a point, or a probability of greater than 1.

sseira commented 4 years ago

Im experiencing the same issue...

Wondering if dealing with this bug could also lead to expanding the implementation to accept non-numeric data values such as ["a", "a", "b", "b", "c"]

sseira commented 4 years ago

@jamtholee have you found a workaround?

jamtholee commented 4 years ago

Unfortunately not - my use case just meant I pre-screened for large numbers of repeated data and just didn't do KDE in that case