observablehq / plot

A concise API for exploratory data visualization implementing a layered grammar of graphics
https://observablehq.com/plot/
ISC License
4.36k stars 176 forks source link

Automatic binning when a facet dimension is quantitative? #14

Open mbostock opened 3 years ago

mbostock commented 3 years ago

It’d be neat if you could use a quantitative dimension for faceting, and we automatically binned it (say using d3.bin) into a reasonable number of facets.

Fil commented 3 years ago

Borrowing from cartography we would want to use Jenks natural breaks or k-means, not only quantize. Seems particularly relevant for faceting, to avoid creating spurious (almost empty) facets. E.g. if the dimension has 3 modes we want those modes as the facets.

This would be done, I guess, by specifying the thresholds (or threshold generator) to d3.bin.

For a relevant example, I combined https://github.com/observablehq/plot/commit/ac93f582aaba5e577f121a955e0fc6e1d2012ca6 with simple-statistics' ckmeans method to cluster countries by GDP per cap: Capture d’écran 2020-11-24 à 10 01 51

These 4 clusters would be my facets.

mbostock commented 3 years ago

The default thresholds using d3.ticks have the nice property that the axis documents the threshold values. I wonder if you specify alternative thresholds if there would be a convenient way to use those threshold values as ticks also — it’s hard to tell in the screenshot above exactly where the thresholds are. Though, I suppose exactness is not essential and they’re probably not nice round values anyway.

Fil commented 3 years ago

The https://observablehq.com/d/e87ba37a7b86bb94#ckMeansNiceThresholds function returns "not so ugly" thresholds, I suppose we could use them as ticks: for example : [14500, 38000, 80000].

mbostock commented 3 years ago

Adding ticks: breaks to the x-axis definition works well if you’re passing in explicit thresholds.

mbostock commented 3 years ago

https://observablehq.com/@observablehq/quanti-facet

mbostock commented 1 year ago

The interval scale option is a great workaround for this issue. It’s not automatic since the interval isn’t computed automatically, but it makes it very easy to bin while faceting. For example:

Screenshot 2023-04-23 at 1 50 27 PM
Plot.plot({
  fy: {
    grid: true,
    tickFormat: ".1f",
    interval: 0.1,
    reverse: true
  },
  marks: [
    Plot.boxX(olympians.filter((d) => d.height), {x: "weight", fy: "height"})
  ]
})