observablehq / plot

A concise API for exploratory data visualization implementing a layered grammar of graphics
https://observablehq.com/plot/
ISC License
4.16k stars 171 forks source link

Added density reducer to bin/group/hexbin #2047

Open wirhabenzeit opened 3 months ago

wirhabenzeit commented 3 months ago

This is a pull request addressing https://github.com/observablehq/plot/issues/1940, see also the discussion in https://talk.observablehq.com/t/normalised-histogram-in-observable-plot/8576

Basically I added a new reduceDensity reducer which computes the density per series/facet. This is slightly different from the reduceProportion (with or without the facet scope) reducer (when not supplied with a value)

I forked the notebook https://observablehq.com/@fil/plot-normalized-histograms here https://observablehq.com/d/fb0d876105777d59 to illustrate the functionality. I also added a test plot in test/plots/density-reducer.ts

The main change to existing code is that in src/transforms/group.js, src/transforms/bin.js and src/transforms/hexbin.js need to call the reducer scope on a per group level as in

for (const o of outputs) o.scope("group", I);

I noticed that this new density reducer can also be used as a replacement for proportion-facet in

where (at least to me) it also makes sense semantically.

Among the tests this would leave

as the only place where proportion-facet is used. Here density does not makes sense on a group level due to the grouping by fill.

One could also consider a more customisable reducer allowing to specify the normalisation scope, but I could not come up with a satisfying syntax. Something like

..., y: {reducer: "density", scope: "group"}, ...

feels a bit too verbose given that most reducers do not have/need any scope.

wirhabenzeit commented 3 months ago

@Fil thanks for the review, will look into it! One question regarding

I think we can extend the concept to weighted density when the original channel is a value?

What do you mean by this, basically like proportion-facet but per series? What is the expected output of something like

Plot.binX({y: "density"}, {x: "xVar", y: "yVar", "stroke": "strokeVar"})

in this case? Is it the sum of the values in the bin for a given series, divided by the sum over the whole series, i.e.

sum( d.y | d.x in bin, d.z = z ) / sum( d.y | d.z = z )

or something else? I am not sure normalising by bin area makes sense in this case?

Fil commented 3 months ago

Normalizing with weights is for example, when a data point is a city, and represents anywhere from 1,000 to 1 million inhabitants. You want the same chart as you would have if you had one point per inhabitant.