Closed domoritz closed 4 years ago
In the future, we will provide alignment to create violin plots. For now, we won't change anything about the spec above (I also don't know how what we would change) but we should allow disabling of imputation like this:
{
"$schema": "https://vega.github.io/schema/vega-lite/v4.json",
"data": {"url": "data/iris.json"},
"transform": [
{
"fold": ["sepalLength", "sepalWidth", "petalLength", "petalWidth"],
"as": ["feature", "value"]
},
{"density": "value", "extent": [0, 8], "groupby": ["feature"]}
],
"mark": {"type": "area", "orient": "horizontal"},
"encoding": {
"column": {"type": "quantitative", "field": "feature"},
"x": {
"type": "quantitative",
"field": "density",
"stack": "center",
"impute": null
},
"y": {"type": "quantitative", "field": "value"}
},
"width": 60
}
Not clear to me why imputation would be opt out rather than opt in. Also, if the same logic also applies to line marks we have the same problem for regression lines. In general it is simply not correct to assume that different groups within a groupby should have identical x/y domain values.
I'm looking into why we initially decided to add imputation by default. Maybe we can disable it.
We are currently only adding imputation when stacking path marks.
Here is what happens without imputation by default for stacked path marks:
And with imputation
{
"$schema": "https://vega.github.io/schema/vega-lite/v4.json",
"data": { "url": "data/population.json"},
"transform": [
{"filter": "datum.year == 2000"},
{"filter": "datum.age>50 || datum.sex == 2"},
{"calculate": "datum.sex == 2 ? 'Female' : 'Male'", "as": "gender"}
],
"mark": {"type": "area", "line": true},
"encoding": {
"y": {
"aggregate": "sum", "field": "people", "type": "quantitative"
},
"x": {"field": "age", "type": "ordinal"},
"color": {
"field": "gender", "type": "nominal",
"scale": {"range": ["#675193", "#ca8861"]}
},
"opacity": {"value": 0.7}
}
}
Can you say more about why imputation should be opt-in for stacked path marks?
Ah that makes sense. Do we have anyway of knowing at compile time that a stack only has one entry, as in this violin case? Or, can we somehow use Vega’s xc channel instead of a stack transform?
Or, can we somehow use Vega’s xc channel instead of a stack transform?
I think that's the right thing to do in general but I'd like to defer this feature to 4.1. For now, we should support disabling imputation.
I have a fix in https://github.com/vega/vega-lite/pull/5617 for now (which I think we should have either way).
We could know that there is only a single mark (if we don't encode color, opacity, detail, etc) but this would require more modifications that I want to do right now and the right solution is to center marks without stacking. We need to think a bit more about a design for that.
New spec with penguins: Open the Chart in the Vega Editor
@jheer said: I took a closer look and the culprit is not a sorting issue, but rather the auto-magical inclusion of an impute transform that has no business being there. By default, Vega performs adaptive sampling to determine which points along the density curve to include. As this can result in different sample points for the different areas, their domain values should not be used together to perform imputation. @domoritz, @kanitw I think this needs to be fixed prior to a v4 release.