vega / vega-lite

A concise grammar of interactive graphics, built on Vega.
https://vega.github.io/vega-lite/
BSD 3-Clause "New" or "Revised" License
4.68k stars 611 forks source link

Concatenated data incorrectly ordered in the vega spec #8173

Open joelostblom opened 2 years ago

joelostblom commented 2 years ago

The following vegalite spec works as expected:

{
  "hconcat": [
    {
      "data": {"name": "data-8140b0eafc79d1623809cd9c3e8f5d3b"},
      "mark": "bar",
      "encoding": {
        "x": {"bin": true, "field": "population", "type": "quantitative"},
        "y": {"aggregate": "count", "type": "quantitative"}
      }
    },
    {
      "data": {
        "url": "https://raw.githubusercontent.com/ginseng666/GeoJSON-TopoJSON-Austria/master/2021/simplified-99.5/gemeinden_995_topo.json",
        "format": {"feature": "gemeinden", "type": "topojson"}
      },
      "mark": "geoshape",
      "encoding": {
        "color": {
          "field": "population",
          "scale": {"scheme": "spectral"},
          "type": "quantitative"
        }
      },
      "transform": [
        {
          "lookup": "properties.iso",
          "from": {
            "data": {"name": "data-8140b0eafc79d1623809cd9c3e8f5d3b"},
            "key": "iso",
            "fields": ["population"]
          }
        }
      ]
    }
  ],
  "$schema": "https://vega.github.io/schema/vega-lite/v5.2.0.json",
  "datasets": {
    "data-8140b0eafc79d1623809cd9c3e8f5d3b": [
      {"iso": 10101, "population": 3239},
      {"iso": 10201, "population": 1985},
      {"iso": 10301, "population": 1890},
      {"iso": 10302, "population": 1846},
      {"iso": 10303, "population": 2122},
      {"iso": 10304, "population": 3217}
    ]
  }
}

image

However, if I change the order of the concatenated charts I get the error Undefined data set name: "data_0" Open the Chart in the Vega Editor.

Looking at the compiled vega spec the order of the datasets seems incorrect. data_0 should be first since it is referenced in source_0, and swapping the order fixes the problem. So instead of this incorrect order:

  "data": [
    {
      "name": "data-8140b0eafc79d1623809cd9c3e8f5d3b",
      "format": {},
      "values": [
        {"iso": 10101, "population": 3239},
        {"iso": 10201, "population": 1985},
        {"iso": 10301, "population": 1890},
        {"iso": 10302, "population": 1846},
        {"iso": 10303, "population": 2122},
        {"iso": 10304, "population": 3217}
      ]
    },
    {
      "name": "source_0",
      "url": "https://raw.githubusercontent.com/ginseng666/GeoJSON-TopoJSON-Austria/master/2021/simplified-99.5/gemeinden_995_topo.json",
      "format": {"feature": "gemeinden", "type": "topojson"},
      "transform": [
        {
          "type": "lookup",
          "from": "data_0",
          "key": "iso",
          "fields": ["properties.iso"],
          "values": ["population"]
        },
        {
          "type": "filter",
          "expr": "isValid(datum[\"population\"]) && isFinite(+datum[\"population\"])"
        }
      ]
    },
    {
      "name": "data_0",
      "source": "data-8140b0eafc79d1623809cd9c3e8f5d3b",
      "transform": [
        {
          "type": "extent",
          "field": "population",
          "signal": "concat_1_bin_maxbins_10_population_extent"
        },
        {
          "type": "bin",
          "field": "population",
          "as": ["bin_maxbins_10_population", "bin_maxbins_10_population_end"],
          "signal": "concat_1_bin_maxbins_10_population_bins",
          "extent": {"signal": "concat_1_bin_maxbins_10_population_extent"},
          "maxbins": 10
        }
      ]
    },

changing the vega spec to the following makes the charts show up as expected:

  "data": [
    {
      "name": "data-8140b0eafc79d1623809cd9c3e8f5d3b",
      "format": {},
      "values": [
        {"iso": 10101, "population": 3239},
        {"iso": 10201, "population": 1985},
        {"iso": 10301, "population": 1890},
        {"iso": 10302, "population": 1846},
        {"iso": 10303, "population": 2122},
        {"iso": 10304, "population": 3217}
      ]
    },
        {
      "name": "data_0",
      "source": "data-8140b0eafc79d1623809cd9c3e8f5d3b",
      "transform": [
        {
          "type": "extent",
          "field": "population",
          "signal": "concat_1_bin_maxbins_10_population_extent"
        },
        {
          "type": "bin",
          "field": "population",
          "as": ["bin_maxbins_10_population", "bin_maxbins_10_population_end"],
          "signal": "concat_1_bin_maxbins_10_population_bins",
          "extent": {"signal": "concat_1_bin_maxbins_10_population_extent"},
          "maxbins": 10
        }
      ]
    },
    {
      "name": "source_0",
      "url": "https://raw.githubusercontent.com/ginseng666/GeoJSON-TopoJSON-Austria/master/2021/simplified-99.5/gemeinden_995_topo.json",
      "format": {"feature": "gemeinden", "type": "topojson"},
      "transform": [
        {
          "type": "lookup",
          "from": "data_0",
          "key": "iso",
          "fields": ["properties.iso"],
          "values": ["population"]
        },
        {
          "type": "filter",
          "expr": "isValid(datum[\"population\"]) && isFinite(+datum[\"population\"])"
        }
      ]
    },

image

PBI-David commented 2 years ago

The reason it breaks is that you're doing a lookup transform for a file that is not yet defined. If you place the lookup file at the top of the spec, it works fine.

image


{
  "hconcat": [
    {
      "data": {"name": "data-8140b0eafc79d1623809cd9c3e8f5d3b"},
      "mark":"text"
    },
    {
      "data": {
        "url": "https://raw.githubusercontent.com/ginseng666/GeoJSON-TopoJSON-Austria/master/2021/simplified-99.5/gemeinden_995_topo.json",
        "format": {"feature": "gemeinden", "type": "topojson"}
      },
      "mark": "geoshape",
      "encoding": {
        "color": {
          "field": "population",
          "scale": {"scheme": "spectral"},
          "type": "quantitative"
        }
      },
      "transform": [
        {
          "lookup": "properties.iso",
          "from": {
            "data": {"name": "data-8140b0eafc79d1623809cd9c3e8f5d3b"},
            "key": "iso",
            "fields": ["population"]
          }
        }
      ]
    },
    {
      "data": {"name": "data-8140b0eafc79d1623809cd9c3e8f5d3b"},
      "mark": "bar",
      "encoding": {
        "x": {"bin": true, "field": "population", "type": "quantitative"},
        "y": {"aggregate": "count", "type": "quantitative"}
      }
    }
  ],
  "$schema": "https://vega.github.io/schema/vega-lite/v5.2.0.json",
  "datasets": {
    "data-8140b0eafc79d1623809cd9c3e8f5d3b": [
      {"iso": 10101, "population": 3239},
      {"iso": 10201, "population": 1985},
      {"iso": 10301, "population": 1890},
      {"iso": 10302, "population": 1846},
      {"iso": 10303, "population": 2122},
      {"iso": 10304, "population": 3217}
    ]
  }
}
joelostblom commented 2 years ago

Thanks for the reply @PBI-David! Wouldn't it be convenient if Vega-Lite resolved the order of the data sources on the Vega level as I suggested above regardless of the order in the Vega-Lite spec? Then we could avoid adding an empty text mark as in your example, and also avoid having to include the logic for that in packages like Altair.

PBI-David commented 2 years ago

What you say seems reasonable but no idea how complicated it would be to build the dependency tree for the transforms. BTW, you don't need the empty text mark. The following works just as well.


{
  "$schema": "https://vega.github.io/schema/vega-lite/v5.2.0.json",
  "data": {
    "name": "data-8140b0eafc79d1623809cd9c3e8f5d3b",
    "values": [
      {"iso": 10101, "population": 3239},
      {"iso": 10201, "population": 1985},
      {"iso": 10301, "population": 1890},
      {"iso": 10302, "population": 1846},
      {"iso": 10303, "population": 2122},
      {"iso": 10304, "population": 3217}
    ]
  },
  "hconcat": [
    {
      "data": {
        "url": "https://raw.githubusercontent.com/ginseng666/GeoJSON-TopoJSON-Austria/master/2021/simplified-99.5/gemeinden_995_topo.json",
        "format": {"feature": "gemeinden", "type": "topojson"}
      },
      "mark": "geoshape",
      "encoding": {
        "color": {
          "field": "population",
          "scale": {"scheme": "spectral"},
          "type": "quantitative"
        }
      },
      "transform": [
        {
          "lookup": "properties.iso",
          "from": {
            "data": {"name": "data-8140b0eafc79d1623809cd9c3e8f5d3b"},
            "key": "iso",
            "fields": ["population"]
          }
        }
      ]
    },
    {
      "data": {"name": "data-8140b0eafc79d1623809cd9c3e8f5d3b"},
      "mark": "bar",
      "encoding": {
        "x": {"bin": true, "field": "population", "type": "quantitative"},
        "y": {"aggregate": "count", "type": "quantitative"}
      }
    }
  ]
}