Closed mkfreeman closed 3 years ago
Good find. I think we should consider not applying an inset ({inset: 0}
) if there are a lot of bins:
We could also cap the number of bins returned by the default d3.thresholdScott based on the width of the chart:
Doesn't this mean we have two issues?
a) in some cases the default binning strategy creates too many bins
this might be addressed at the level of d3.thresholdScott, not based on the chart's width; or on the level of Plot.bin, based on the chart's width?
b) insets can "reverse" a rect and make it invisible
the formula is (in rect.js):
.attr("width", i => Math.max(0, Math.abs(X2[i] - X1[i]) - this.insetLeft - this.insetRight))
here maybe the minimum width should be more than 0, perhaps 0.5? It would not eliminate all cases, in particular if you have a white stroke and the default fill-then-stroke paint order, but it would mean that a mark however small is never "zero-width".
This is a more general problem, for example when stacking values, the values that generate a rect that smaller than 1px can disappear from view if they have a white stroke. Should we decide we want all marks' geometries to be visible as described above, the rendering might still sometimes make them invisible, sometimes deliberately (fillOpacity: 0), sometimes unwittingly (stroke: white). I'm not sure it's possible to fix that, tbh, since we can't make a pixel be at the same time white and black.
It's not only rects, in practice we often have to add a half-pixel to a point's radius so that the colored surface area (after the inner part of the 1px white stroke has been deducted) is proportional to the value. The default r = sqrt(value)
is not 100% correct and should be r = sqrt(value) + 1/2 strokeWidth
if the stroke color is "substracting matter", eg white on a white background will make points smaller that .5px radius invisible. This is fixable by the user by setting r: {range: [.5, max] }
, so that a value of 0 shows 0 color, but a value immediately > 0 shows a bit of color.
@severo has found a different avatar of this issue, when you bin on a single value: https://talk.observablehq.com/t/histogram-with-plot-does-not-show-a-bar-for-a-single-value-array/5111/2
Plot.rectY([{ weight: 3 }], Plot.binX({ y: "count" }, { x: "weight" })).plot()
Another difference is that Plot's (and D3's) bins conflate nulls and zeros, resulting in about 35k members in the first bin, whereas ggplot2 counts about 20k members (there are 15061 nulls in the data).
(For this extrememly skewed distribution, d3.thresholdFreedmanDiaconis returns almost 3 times as many bins as there are values!)
Filed https://github.com/d3/d3-array/issues/203 for the null conflation issue.
That was a productive issue with 3 [Edit: 4!] pull-requests :)
I don't know how to address @severo's example though:
Plot.rectY([{ weight: 3 }], Plot.binX({ y: "count" }, { x: "weight" })).plot()
creates one bin on the [3, 3] range, and it's hard to give it a width.
We might look at how other libraries handle this case
In RStudio it creates a single bar taking the whole width, and with a total extent of 1 (from 2.5 to 3.5).
A fix to @severo's issue is implemented in https://github.com/observablehq/plot/pull/438
Calling this fixed by #421 (and #470), though see https://github.com/observablehq/plot/pull/422#issuecomment-896278326.
The default binning in Plot can create rects with a width that is too small to see (making a user -- namely, me -- believe the code is broken). As an example (also see this notebook):
Compared to the same plot made using ggplot2: