vega / vega-lite

A concise grammar of interactive graphics, built on Vega.
https://vega.github.io/vega-lite/
BSD 3-Clause "New" or "Revised" License
4.7k stars 617 forks source link

Weird behavior when layering errorbands with `stdev` and `stderr` #8961

Open thomascamminady opened 1 year ago

thomascamminady commented 1 year ago

I originally posted this for altair, but I think this is a vega-lite issue (https://github.com/altair-viz/altair/issues/3089).

Something weird is going on when layering and error band plot with extent stdev and with stderr.

Let's start with just the individual error plots:

Just stdev:

{
  "data": {"url": "data/barley.json"},
  "layer": [
     {
      "mark": {"type":"errorband","extent":"stdev","color":"red"},
      "encoding": {
        "y": {
          "field": "yield",
          "type": "quantitative",
          "scale": {"domain": [20,55]}
        },
        "x": {"field": "variety", "type": "ordinal"}
      }
    }
  ]
}

image

Now just stderr:

{
  "data": {"url": "data/barley.json"},
  "layer": [
     {
      "mark": {"type":"errorband","extent":"stderr","color":"red"},
      "encoding": {
        "y": {
          "field": "yield",
          "type": "quantitative",
          "scale": {"domain": [20,55]}
        },
        "x": {"field": "variety", "type": "ordinal"}
      }
    }
  ]
}

image

This looks as expected. But now let's layer them.

Here in order stdev then stderr:

{
  "data": {"url": "data/barley.json"},
  "layer": [
     {
      "mark": {"type":"errorband","extent":"stdev","color":"red"},
      "encoding": {
        "y": {
          "field": "yield",
          "type": "quantitative",
          "scale": {"domain": [20,55]}
        },
        "x": {"field": "variety", "type": "ordinal"}
      }
    },
    {
      "mark": {"type":"errorband","extent":"stderr","color":"blue"},
      "encoding": {
        "y": {
          "field": "yield",
          "type": "quantitative",
          "scale": {"domain": [20,55]}
        },
        "x": {"field": "variety", "type": "ordinal"}
      }
    }
  ]
}

image

And here the opposite order

{
  "data": {"url": "data/barley.json"},
  "layer": [
     {
      "mark": {"type":"errorband","extent":"stderr","color":"red"},
      "encoding": {
        "y": {
          "field": "yield",
          "type": "quantitative",
          "scale": {"domain": [20,55]}
        },
        "x": {"field": "variety", "type": "ordinal"}
      }
    },
    {
      "mark": {"type":"errorband","extent":"stdev","color":"blue"},
      "encoding": {
        "y": {
          "field": "yield",
          "type": "quantitative",
          "scale": {"domain": [20,55]}
        },
        "x": {"field": "variety", "type": "ordinal"}
      }
    }
  ]
}

image

So this is weird, I would expect two bands. Moreover, if I replace stderr with ci, then I get a layered chart

{
  "data": {"url": "data/barley.json"},
  "layer": [
     {
      "mark": {"type":"errorband","extent":"ci","color":"red"},
      "encoding": {
        "y": {
          "field": "yield",
          "type": "quantitative",
          "scale": {"domain": [20,55]}
        },
        "x": {"field": "variety", "type": "ordinal"}
      }
    },
    {
      "mark": {"type":"errorband","extent":"stdev","color":"blue"},
      "encoding": {
        "y": {
          "field": "yield",
          "type": "quantitative",
          "scale": {"domain": [20,55]}
        },
        "x": {"field": "variety", "type": "ordinal"}
      }
    }
  ]
}

image

Or here for stderr and ci

{
  "data": {"url": "data/barley.json"},
  "layer": [
     {
      "mark": {"type":"errorband","extent":"stderr","color":"red"},
      "encoding": {
        "y": {
          "field": "yield",
          "type": "quantitative",
          "scale": {"domain": [20,55]}
        },
        "x": {"field": "variety", "type": "ordinal"}
      }
    },
    {
      "mark": {"type":"errorband","extent":"ci","color":"blue"},
      "encoding": {
        "y": {
          "field": "yield",
          "type": "quantitative",
          "scale": {"domain": [20,55]}
        },
        "x": {"field": "variety", "type": "ordinal"}
      }
    }
  ]
}

image

I would also expect a layered chart with two bands when using stderr and stdev.

thomascamminady commented 1 year ago

Just a guess, when looking at the vega code that the vega-lite code gets compiled into, I see this:

"transform": [
        {
          "type": "aggregate",
          "groupby": [
            "variety"
          ],
          "ops": [
            "stdev",
            "mean",
            "ci0",
            "ci1"
          ],
          "fields": [
            "yield",
            "yield",
            "yield",
            "yield"
          ],
          "as": [
            "extent_yield",
            "center_yield",
            "lower_yield",
            "upper_yield"
          ]
        }
      ]

vs.

"transform": [
        {
          "type": "aggregate",
          "groupby": [
            "variety"
          ],
          "ops": [
            "stdev",
            "mean",
            "stderr"
          ],
          "fields": [
            "yield",
            "yield",
            "yield"
          ],
          "as": [
            "extent_yield",
            "center_yield",
            "extent_yield"
          ]

I'm wondering whether the extent_yield is overwritten / reused? Is this maybe related to code here: https://github.com/vega/vega-lite/blob/8607a74058485eb5685011f6acd992f0dab2c22d/src/compositemark/errorbar.ts#L459

Ultimately, this is just a guess though.