Don't add impute for violin plot

vega / vega-lite

A concise grammar of interactive graphics, built on Vega.

https://vega.github.io/vega-lite/

BSD 3-Clause "New" or "Revised" License

4.68k stars 611 forks source link

Don't add impute for violin plot #5611

Closed domoritz closed 4 years ago

domoritz commented 4 years ago

{
  "data": {"url": "data/iris.json"},
  "transform": [
    {
      "fold": ["sepalLength", "sepalWidth",
      "petalLength", "petalWidth"],
      "as": ["feature", "value"]
    },
    {
      "density": "value",
      "extent": [0, 8],
      "groupby": ["feature"]
    }
  ],
  "mark": {"type": "area", "orient": "horizontal"},
  "encoding": {
    "column": {"type": "nominal", "field": "feature"},
    "x": {"type": "quantitative", "field": "density", "stack": "center"},
    "y": {"type": "quantitative", "field": "value"}
  },
  "width": 60
}

@jheer said: I took a closer look and the culprit is not a sorting issue, but rather the auto-magical inclusion of an impute transform that has no business being there. By default, Vega performs adaptive sampling to determine which points along the density curve to include. As this can result in different sample points for the different areas, their domain values should not be used together to perform imputation. @domoritz, @kanitw I think this needs to be fixed prior to a v4 release.

domoritz commented 4 years ago

In the future, we will provide alignment to create violin plots. For now, we won't change anything about the spec above (I also don't know how what we would change) but we should allow disabling of imputation like this:

{
  "$schema": "https://vega.github.io/schema/vega-lite/v4.json",
  "data": {"url": "data/iris.json"},
  "transform": [
    {
      "fold": ["sepalLength", "sepalWidth", "petalLength", "petalWidth"],
      "as": ["feature", "value"]
    },
    {"density": "value", "extent": [0, 8], "groupby": ["feature"]}
  ],
  "mark": {"type": "area", "orient": "horizontal"},
  "encoding": {
    "column": {"type": "quantitative", "field": "feature"},
    "x": {
      "type": "quantitative",
      "field": "density",
      "stack": "center",
      "impute": null
    },
    "y": {"type": "quantitative", "field": "value"}
  },
  "width": 60
}

jheer commented 4 years ago

Not clear to me why imputation would be opt out rather than opt in. Also, if the same logic also applies to line marks we have the same problem for regression lines. In general it is simply not correct to assume that different groups within a groupby should have identical x/y domain values.

domoritz commented 4 years ago

I'm looking into why we initially decided to add imputation by default. Maybe we can disable it.

domoritz commented 4 years ago

We are currently only adding imputation when stacking path marks.

domoritz commented 4 years ago

Here is what happens without imputation by default for stacked path marks:

And with imputation

{
  "$schema": "https://vega.github.io/schema/vega-lite/v4.json",
  "data": { "url": "data/population.json"},
  "transform": [
    {"filter": "datum.year == 2000"},
    {"filter": "datum.age>50 || datum.sex == 2"},
    {"calculate": "datum.sex == 2 ? 'Female' : 'Male'", "as": "gender"}
  ],
  "mark": {"type": "area", "line": true},
  "encoding": {
    "y": {
      "aggregate": "sum", "field": "people", "type": "quantitative"
    },
    "x": {"field": "age", "type": "ordinal"},
    "color": {
      "field": "gender", "type": "nominal",
      "scale": {"range": ["#675193", "#ca8861"]}
    },
    "opacity": {"value": 0.7}
  }
}

Can you say more about why imputation should be opt-in for stacked path marks?

jheer commented 4 years ago

Ah that makes sense. Do we have anyway of knowing at compile time that a stack only has one entry, as in this violin case? Or, can we somehow use Vega’s xc channel instead of a stack transform?

domoritz commented 4 years ago

Or, can we somehow use Vega’s xc channel instead of a stack transform?

I think that's the right thing to do in general but I'd like to defer this feature to 4.1. For now, we should support disabling imputation.

domoritz commented 4 years ago

I have a fix in https://github.com/vega/vega-lite/pull/5617 for now (which I think we should have either way).

We could know that there is only a single mark (if we don't encode color, opacity, detail, etc) but this would require more modifications that I want to do right now and the right solution is to center marks without stacking. We need to think a bit more about a design for that.

domoritz commented 10 months ago

New spec with penguins: Open the Chart in the Vega Editor