observablehq / plot

A concise API for exploratory data visualization implementing a layered grammar of graphics
https://observablehq.com/plot/
ISC License
4.37k stars 176 forks source link

"title" channel function is under certain circumstances given an array of length 1 #360

Closed knutwannheden closed 3 years ago

knutwannheden commented 3 years ago

I haven't tried to find out exactly which combination of elements causes this, but as you can see in the example I published at https://observablehq.com/@knutwannheden/plot-stacked-bin-dots I had to specify the title channel mapping as title: d => d[0].x rather than simply title: "x". I suspect this is a bug, because fill: "x" works (also part of the example).

For completeness I also include the example code here:

Plot.plot({
  marks: [
    Plot.ruleX([10]),
    Plot.ruleY([0]),
    Plot.dot(
      data,
      Plot.stackY(
        Plot.binX(
          {},
          {
            x: "x",
            z: data,
            thresholds: 30,
            fill: "x",
            // following line doesn't work
            // title: "x",
            title: d => d[0].x,
            r: 10
          }
        )
      )
    )
  ],
  width: 600,
  height: 150
})
Fil commented 3 years ago

I answered here: https://talk.observablehq.com/t/stacking-binned-dots-using-plot/5017/5 ; there's probably a way to add an example in the README since this is going to be a common task.

knutwannheden commented 3 years ago

Hmm... I still don't understand why this is different for the fill channel. I must be misunderstanding something here.

mbostock commented 3 years ago

It’s a question of whether the title channel is computed before or after binning.

If you don’t declare the title channel in the outputs argument (the first argument) to the bin transform, then the title channel in the options argument (the second argument) will be passed through to the Plot.dot mark and be computed after binning, in which case the function will receive the binned data (the input data grouped by z, which in this case means simply wrapping each datum in an array).

If you declare the title channel in the outputs argument, then it will be computed before binning and aggregated by the bin transform. This means the title channel will be computed for each element in the dataset, and you can then decide how you want to aggregate the titles for the data into an aggregate title for the bin. In this case the first aggregator is the most appropriate since the titles for all the data points in a given bin are the same.

This behavior is described in the API reference:

While it is possible to compute channel values on the binned data by defining channel values as a function, more commonly channel values are computed directly by the bin transform, either implicitly or explicitly.

I can add this additional explanation though…

knutwannheden commented 3 years ago

Thank you very much for the detailed explanation. What got me was that title behaved differently from fill here, even though I was under the impression that they should have behaved the same.

mbostock commented 3 years ago

The difference with the fill channel is that it, along with z and stroke, are group-eligible channels, and hence are automatically aggregated by the bin transform. The title channel is not eligible for this treatment.

If any of z, fill, or stroke is a channel, the first of these channels will be used to subdivide bins.

knutwannheden commented 3 years ago

Thanks for bearing with me and thanks for a great library. It is a pleasure to work with!

yurivish commented 3 years ago

If a channel is declared only in options, it will be computed after binning

IIUC, the "computed after binning" clause in this documentation means that the semantics of options changes depending on whether they are passed directly to a mark, or to a bin transform. In the first case the channel function is passed datums, and in the second case it is passed groups.

The current behavior seems useful since you never want more than one title value to be computed per bin group, but maybe there's a way to keep performance and composability, e.g. only calling channel functions specified in the input options for the first value of each group when used with a bin transform?

I'd been thinking of transforms as functions that don't change the interpretation of their input options, which seems like a property worth having if it's not too annoying to deal with in practice. Is this a good way to think of options transforms?

mbostock commented 3 years ago

I'd been thinking of transforms as functions that don't change the interpretation of their input options

I’m not sure I follow — isn’t the entire purpose of transforms to transform options (i.e., to derive new data, new index, and new channels)? That’s why they’re called “options transforms”.

yurivish commented 3 years ago

Maybe that's the root of my misunderstanding. I was expecting that the "input options" passed to the transform would mean the same as if they were passed directly to a plot, e.g.

plot.dot(data, plotOptions)

vs.

plot.dot(data, Plot.someOptionTransform(output, plotOptions))

The transform would return transformed options, but not change the interpretation of the original plot options -- they would "mean the same thing" regardless of whether they were composed with the option transform or not.

mbostock commented 3 years ago

When you pass options to Plot.dot, they are mark options (dot options) rather than top-level plot options. And the purpose of the transform is exactly to transform those options, not to do anything else. So if the transform didn’t transform those options, it wouldn’t be doing anything. But it is true that only some of the options are typically transformed, while others are passed through; which ones are transformed or not depends on the exact transform.

mbostock commented 3 years ago

I think maybe the confusion is that you expect the transform to automatically apply the first reducer to all channels by default, like it does for the z, fill, and stroke channels? That might be nice, but it’s not currently possible because the transform doesn’t know passed on the options what’s a channel and what’s not; that’s up to the downstream mark.

yurivish commented 3 years ago

Yeah, that's a great way to put it. I didn't realize at first that that's what I was asking for, but I think it is.

I was imagining that channel functions are like functions and options transforms are like higher-order functions, and transform composition was like function composition. It just seemed weird that channel functions in the the original options object, which used to operate on individual data points, would be passed through the transform unchanged but now have to operate on groups instead of data points.