Open jakevdp opened 4 years ago
It's probably reasonable to include a field included in the test in the groupby.
If anything I think we should make the following spec:
{
"data": {"url": "data/cars.json"},
"mark": "bar",
"encoding": {
"x": {"field": "Miles_per_Gallon", "bin": true, "type": "quantitative"},
"y": {"aggregate": "count", "type": "quantitative"},
"color": {
"condition": {"test": {"field": "Cylinders", "lt": 5}, "value": "steelblue"},
"value": "darkorange"
}
}
}
However, implementing such parsing for expression would make the code perhaps too complicated.
This is followup from an Altair user question that comes from a potentially confusing aspect of the grammar.
TLDR: In the VL grammar, aggregates specified in encodings are implicitly grouped by other encodings. Should they also be grouped by conditions that appear in those encodings?
Consider this chart (vega editor):
Now suppose the user wants to highlight rows with fewer than 5 cylinders. They might look at the docs and try replacing the color encoding with condition (editor):
This clearly does not have the desired effect, because the count aggregate is no longer grouped by
Cylinders
. For users unfamiliar with the details of how aggregates are computed in VL, it's quite difficult to debug why this is happening.One easy remedy is to explicitly add a
detail
encoding, so that the counts are appropriately grouped (editor):A more complete approach would probably be to apply a calculate transform and encode the color by that field (editor):
But this is probably more suited to a polished, final chart than to quick and dirty data exploration.
Would it make sense to change the grammar such that aggregates specified in encodings will also group by fields that appear in conditional statements? In other words, should we treat fields referenced in conditional expressions as if they are included in the detail encoding? Or if that is too invasive, perhaps log a warning when an aggregate elides a field that's referenced in an expression?