Closed iliatimofeev closed 5 years ago
I know this is likely overkill, but just to note that the same problem exists for row
.
The issue seems to not be in the boxplot logic.
Here is a normalized spec that shows the issue:
Here is a small example
{
"data": {
"values": [
{
"homework_done": false,
"session_time_m": 2,
"session_hour": 1
},
{
"homework_done": false,
"session_time_m": 0,
"session_hour": 2
}
]
},
"$schema": "https://vega.github.io/schema/vega-lite/v3.0.0.json",
"facet": {
"column": {
"type": "nominal",
"field": "session_hour"
}
},
"spec": {
"layer": [
{
"transform": [
{
"aggregate": [
{
"op": "median",
"field": "session_time_m",
"as": "mid_box_session_time_m"
}
],
"groupby": [
"homework_done"
]
}
],
"layer": [
{
"mark": {
"type": "tick"
},
"encoding": {
"y": {
"field": "mid_box_session_time_m",
"type": "quantitative"
},
"x": {
"field": "homework_done",
"type": "nominal"
}
}
}
]
},
{
"transform": [
{
"window": [
],
"groupby": [
"homework_done"
]
}
],
"mark": {
"type": "point"
},
"encoding": {
"y": {
"field": "session_time_m",
"type": "quantitative"
},
"x": {
"field": "homework_done",
"type": "nominal"
}
}
}
]
}
}
Hmm, this doesn't look fun.
Hmm, weird. We have a data_1
after facet but somehow Vega doesn't find it. I thought that worked.
Ahh, the problem are the scales. We have a scale at the top level spec but it reads data from data_1
, which is defined in an inner scope. What's weird is that I thought we are making a copy of the dataflow for this reason. When I change the domain to use data_3
, it works.
I'll keep looking later.
🎉
So, the issue seems to be that we didn't correctly treat window aggregates as aggregates. Now the chart just needs a bite more data.
Here is another example that doesn't work
{
"$schema": "https://vega.github.io/schema/vega-lite/v2.json",
"description": "A vertical 1D box plot showing median, min, and max in the US population distribution of age groups in 2000.",
"data": {"url": "data/population.json"},
"mark": "boxplot",
"encoding": {
"y": {
"field": "people",
"type": "quantitative",
"axis": {"title": "population"}
},
"column": {
"field": "sex",
"type": "ordinal"
}
}
}
Hmm, why is people in the domain here?
"scales": [
{
"name": "y",
"type": "linear",
"domain": {
"fields": [
{"data": "data_1", "field": "lower_whisker_people"},
{"data": "data_1", "field": "lower_box_people"},
{"data": "data_1", "field": "upper_box_people"},
{"data": "data_1", "field": "upper_whisker_people"},
{"data": "data_1", "field": "mid_box_people"},
{"data": "data_3", "field": "people"}
]
},
"range": [{"signal": "child_height"}, 0],
"nice": true,
"zero": true
}
],
Ahh, people
is for outliers. We need to use a window to calculate an aggregate and then filter with it. The right thing for the scale is to be derived from a clone of the dataflow that is hoisted to the top. We do this for normal aggregates so let's see why this isn't happening for window aggregates.
Here is a small spec that shows the error even when I fix the push down logic.
{
"$schema": "https://vega.github.io/schema/vega-lite/v2.json",
"data": {
"url": "data/population.json"
},
"facet": {
"column": {
"field": "sex",
"type": "ordinal"
}
},
"spec": {
"layer": [
{
"transform": [
{
"aggregate": [
{
"op": "min",
"field": "people",
"as": "min_people"
}
],
"groupby": []
}
],
"mark": {
"type": "tick",
"style": "boxplot-rule"
},
"encoding": {
"y": {
"field": "min_people",
"type": "quantitative"
}
}
},
{
"transform": [
{
"window": [
{
"op": "q1",
"field": "people",
"as": "lower_box_people"
}
],
"groupby": []
}
],
"mark": {
"type": "point",
"style": "boxplot-outliers"
},
"encoding": {
"y": {
"field": "people",
"type": "quantitative"
}
}
}
]
}
}
@invokesus had a hunch that the bug may be caused by https://github.com/vega/vega-lite/pull/4029. However, going back to dad69556d, doesn't seem to fix the issue with https://github.com/vega/vega-lite/issues/4156#issuecomment-424560258 but it does fix https://github.com/vega/vega-lite/issues/4156#issuecomment-423722449. So maybe https://github.com/vega/vega-lite/pull/4175 resolves at least partially resolves the issue.
This example works before the transform merging but not after:
{
"data": {
"values": [
{
"homework_done": false,
"session_time_m": 2,
"session_hour": 1
},
{
"homework_done": false,
"session_time_m": 0,
"session_hour": 2
}
]
},
"$schema": "https://vega.github.io/schema/vega-lite/v3.0.0.json",
"facet": {
"column": {
"type": "nominal",
"field": "session_hour"
}
},
"spec": {
"layer": [
{
"transform": [
{
"aggregate": [
{
"op": "median",
"field": "session_time_m",
"as": "mid_box_session_time_m"
}
],
"groupby": []
}
],
"mark": {
"type": "tick"
},
"encoding": {
"y": {
"field": "mid_box_session_time_m",
"type": "quantitative"
}
}
},
{
"transform": [
{
"window": [],
"groupby": []
}
],
"mark": {
"type": "point"
},
"encoding": {
"y": {
"field": "session_time_m",
"type": "quantitative"
}
}
}
]
}
}
For some reason, this spec doesn't work in either case
{
"$schema": "https://vega.github.io/schema/vega-lite/v2.json",
"data": {
"url": "data/population.json"
},
"facet": {
"column": {
"field": "sex",
"type": "ordinal"
}
},
"spec": {
"layer": [
{
"transform": [
{
"aggregate": [
{
"op": "min",
"field": "people",
"as": "min_people"
}
],
"groupby": []
}
],
"mark": {
"type": "tick",
"style": "boxplot-rule"
},
"encoding": {
"y": {
"field": "min_people",
"type": "quantitative"
}
}
},
{
"transform": [
{
"window": [],
"groupby": []
}
],
"mark": {
"type": "point"
},
"encoding": {
"y": {
"field": "people",
"type": "quantitative"
}
}
}
]
}
}
Wow, so with dad69556d the dataflow looks like
and with the latest dom/window-dataflow
So something is very wrong here. I'm going to wait for @invokesus to fix https://github.com/vega/vega-lite/pull/4175 and see whether this resolves this problem.
https://github.com/vega/vega-lite/pull/4177 still seems like a good idea so I'll leave it open.
https://github.com/vega/vega-lite/pull/4177 and https://github.com/vega/vega-lite/pull/4175 will fix this.
Phew, this was one of the hardest debugging sessions I've done. Took me three days with some really weird behavior in between. However, it exposed a few separate bugs that are all fixed now and we have tests and helper tools to make sure we can catch these class of bugs much easier now.
boxplot
doesn't works withcolumn
encoding andfacet
. ResultError: Undefined data set name: "data_1"
see editor