vega / vega-lite

A concise grammar of interactive graphics, built on Vega.
https://vega.github.io/vega-lite/
BSD 3-Clause "New" or "Revised" License
4.62k stars 598 forks source link

Facetting on an "array value" breaks boxplots #8201

Open gs0-pix4d opened 2 years ago

gs0-pix4d commented 2 years ago

Please:

In the following example, the rendering is broken for Study ["A", "B"], Project 2, using either facet or row: remove the leading _ from the row specification to try switching between the two (and disable the facet accordingly)

{
  "data": {
    "values": [
        {"err": 8,"study": ["A", "B"],"project": "1"},
        {"err": 3,"study": ["A", "B"],"project": "1"},
        {"err": 6,"study": ["A", "B"],"project": "2"},
        {"err": 0,"study":["A", "B"],"project": "2"}
      ]
  },
  "mark": {"type": "boxplot"},
  "encoding": {
    "facet": {"field": "project"},
    "_row": {"field": "project"},
    "y": {"field": "study"},
    "x": {"field": "err","type": "quantitative"}
  }
}

Adding the following transform fixes the problem: "transform": [{"calculate": "join(datum.study)", "as": "study"}],. In other words, when the field is not an array but a string, everything is alright.

My tests are run using vl2svg cli tool.

Broken rendering: image

Cast to string: image

A more complex example (using real world data, please excuse the crushed boxes of the plot) looks like that: image

You can see that Case 12 seems to work well, and Case 11 has proper data in front of PREFIX-variant4, while all the others are stuck to the top of the plot area.

domoritz commented 2 years ago

Can you look the Vega code to see what needs to be fixed and send a pull request?

gs0-pix4d commented 2 years ago

I would be OK to do it, however I have zero knowledge about the code-base, and I'm a very new Vega user (about two weeks, on and off). I wouldn't know where to start digging and how to debug the behavior.

First of all, is it sure that converting to string is the right thing to do? I was wondering if facetting could have a special meaning on fields that are arrays (such as facetting per array element, for example). In other words, what is (or "are" if it's configurable) the expected outcomes for a dataset like that one (not even talking about boxplots)?

[
  {"val": 3, "groups": ["A", "B"]},
  {"val": 5, "groups": ["A"]}
]
domoritz commented 2 years ago

I don't think we facet per value. If you wanted that, you could explicitly flatten the field first. Using an array should result in a multi-line label.

gs0-pix4d commented 2 years ago

Reading your first comment again, I think you mean digging through the converted-to-Vega specification code, rather than the Vega source code itself.

Here is the somewhat minimal Vega code I could get where I only removed stuff (and updating scales → domain → fields → data to use the source data). At this point, changing anything makes the visualization go wrong in some way, and I could not find a way to change the grammar to fix the problem nicely: casting the array toString of course works, but that's not the intended fix and we want to keep the new line in the label when using an array of strings.

Click me to show Vega description ```json { "$schema": "https://vega.github.io/schema/vega/v5.json", "data": [ { "name": "source_0", "values": [ {"err": 8, "study": ["A", "B"], "project": "1"}, {"err": 3, "study": ["A", "B"], "project": "1"}, {"err": 6, "study": ["A", "B"], "project": "2"}, {"err": 0, "study": ["A", "B"], "project": "2"} ] } ], "signals": [ {"name": "child_width", "value": 200}, {"name": "y_step", "value": 20}, {"name": "child_height", "update": "bandspace(domain('y').length, 0, 0) * y_step"} ], "layout": {"padding": 20, "bounds": "full", "align": "all"}, "marks": [ { "name": "cell", "type": "group", "style": "cell", "from": { "facet": {"name": "facet", "data": "source_0", "groupby": ["project"]} }, "data": [ { "source": "facet", "name": "source_0", "transform": [ { "type": "joinaggregate", "as": ["lower_box_err", "upper_box_err"], "ops": ["q1", "q3"], "fields": ["err", "err"], "groupby": ["project", "study"] } ] } ], "encode": { "update": { "width": {"signal": "child_width"}, "height": {"signal": "child_height"} } }, "marks": [ { "name": "child_layer_0_layer_0_marks", "type": "symbol", "from": {"data": "source_0"}, "encode": { "update": { "x": {"scale": "x", "field": "err"}, "y": {"scale": "y", "field": "study", "band": 0.5} } } } ] } ], "scales": [ { "name": "x", "type": "linear", "domain": { "fields": [ {"data": "source_0", "field": "err"} ] }, "range": [0, {"signal": "child_width"}] }, { "name": "y", "type": "band", "domain": { "fields": [{"data": "source_0", "field": "study"}], "sort": true }, "range": {"step": {"signal": "y_step"}} } ] } ```