vega / datalib

JavaScript data utility library.
http://vega.github.io/datalib/
BSD 3-Clause "New" or "Revised" License
731 stars 133 forks source link

dl.histogram returning repeated bins #107

Open mathisonian opened 4 years ago

mathisonian commented 4 years ago

Hey all,

This library has been extremely useful, but I'm hitting an issue with the histogram method. A simple reproduction is available here: https://observablehq.com/d/cfefd9c0c388f478

The issue is that the values that are returned seem to repeat when a high number of bins is used. For example, I'd expect to get bins with values of ..., .27, .28, .29, ... but instead get two bins at .28 and no bin at .29.

Screen Shot 2020-06-25 at 4 15 54 PM

Its possible that I'm doing something wrong with the options (or just generally unreasonable) in which case please let me know.

jheer commented 4 years ago

Hmm, that looks like a rounding bug introduced by the value call on this line: https://github.com/vega/datalib/blob/master/src/bins/histogram.js#L60

The corresponding function (bins.js#L72) is:

function value(v) {
  return this.step * Math.floor(v / this.step + EPSILON);
}

Meanwhile, the equivalent call in Vega's Bin transform is a bit different. IIRC, I rewrote it to handle these floating point issues:

this.start + this.step * Math.floor(EPSILON + (v - this.start) / this.step)

I'm not sure when I will have the time to patch and test, but in the meantime would welcome a PR if you try the change yourself.

mathisonian commented 4 years ago

Thanks @jheer! I figured it was some rounding issue. These pointers are helpful, I can take a stab at a patch & test in the next couple of days.