observablehq / plot

A concise API for exploratory data visualization implementing a layered grammar of graphics
https://observablehq.com/plot/
ISC License
4.43k stars 180 forks source link

The scale’s transform should apply to the scale’s explicit domain, if any #1565

Open mbostock opened 1 year ago

mbostock commented 1 year ago

In this example, it’s confusing that the domain is specified in the transformed space [-100, 100] instead of the pre-transformed space [0, 2]; everything else, including the ruleY is specified in the pre-transformed space. So, you want the domain to be specified in the same space so that you can add and remove the transform without needing to change anything else in the plot.

Screenshot 2023-05-16 at 11 41 46 AM
Plot.plot({
  y: {
    domain: [-100, 100],
    label: "↑ Close (%)",
    transform: (y) => (y - 1) * 100,
    tickFormat: "+d"
  },
  color: {
    legend: true
  },
  marks: [
    Plot.ruleX([frcb[i].Date]),
    Plot.ruleY([1]),
    Plot.lineY(stocks, Plot.normalizeY((Y) => Y[i], {x: "Date", y: "Close", stroke: "Symbol", tip: true})),
  ]
})

I think the reason we don’t do this now somewhat inadvertent: when the domain is not specified explicitly, it is derived from the channels, which already have the transform applied. So, we don’t want to apply the transform to the domain if the domain was derived from already-transformed channels; we only want to apply it when the domain was specified explicitly.

I don’t see an obvious way of making this change backwards compatible, since it changes the meaning of the domain option when the transform option is present. (We could introduce another option to control whether the transform applies to the domain, but it’ll still be confusing unless we change the default behavior.) So maybe this isn’t fixable, but I figure I would at least write this down in case others run into the same confusion.

Fil commented 1 year ago

The current situation is useful for example when grouping/binning as a scale transform:

screenshot

Plot.dotX(penguins, { x: "body_mass_g", y: "sex", fill: "species" }).plot({
  y: {
    domain: ["F", "M"], // ignores N/A
    transform: (d) => d?.[0]
  }
})

or a facet scale transform (in this case, it's not that we specify the domain, but the transformed domain has only five continents):

athletesBoxingHeight
  Plot.plot({
    width: 600,
    height: 350,
    facet: {data: athletes, x: "nationality"},
    y: {domain: [1.45, 2.1]},
    fx: {transform: (code) => continents.get(code), label: "continent"},
    marks: [
      Plot.frame(),
      Plot.dot(athletes, Plot.dodgeX({y: "height", title: "nationality", fill: "currentColor", anchor: "middle"}))
    ]
  });

http://localhost:8008/?test=athletesBoxingHeight

But I agree that there is a bit of confusion, and it would be nice to have a possibility to specify the domain from the pre-transform space.

mbostock commented 1 year ago

In the first case, you could just say domain: ["FEMALE", "MALE"], though, right? And if you want to put a ruleY on there, you’d need to say ruleY(["FEMALE"]), so it seems reasonable for the scale domain option to be specified consistently. (Edit: Well, I guess this transform is idempotent so you could say ruleY(["F"]), but that isn’t true in general I think it would be misleading to rely on an idempotent transform.)

I don’t follow the second case, since this issue only applies when the domain is specified explicitly; if you don’t specify the domain explicitly, the proposed change has no effect.

Fil commented 1 year ago

I'm trying to think of the usage pattern of having both domain and transform. For the second case (country codes to continents), imagine that you set fx.domain to ["Africa", "Oceania"], to show only these two "transformed" facets (maybe this is coming from a checkbox input, or from another plot's exported scale).

Fil commented 1 year ago

I'd prefer not to change this.

I find it more confusing in some way to have to specify the domain in the original space, since that's not what you have on your screen. I agree that it's already the case with the values in ruleY, but in that case it's obviously data and lives in data space. But in your example, having to work “backwards” that -100 is 0 and +100 is 2, is a bit of a mental effort.

Playing with a slightly different example, where I want the scale to cover the (transformed) domain [-50%, +50%], it feels easier to reason with [-50, 50] than with [0.5, 1.5]. And if that's what I want, then I can do:

domain: [0.5, 1.5].map((y) => (y - 1) * 100),

(Also avoiding the issue of a scale transform that might not be ascending or monotonous (e.g. transform: Math.abs, where the transformed data domain is not the data domain transformed 🤓, so specifying the domain in original space might end up slightly more difficult.)

The zero option also has the same behavior: it adds 0 in the transformed domain.