vega / vega-lite

A concise grammar of interactive graphics, built on Vega.
https://vega.github.io/vega-lite/
BSD 3-Clause "New" or "Revised" License
4.7k stars 617 forks source link

Padding is between plot and axes, not between points. #1339

Closed aishfenton closed 8 years ago

aishfenton commented 8 years ago

I'm trying to plot a heatmap in Vega-Lite without any padding between marks. From the docs it sounds like padding would do the trick "the padding is a multiple of the spacing between points". But as per below plot, padding actually controls padding between axes and plot area, not between the points.

Is this a bug with padding? Or the docs? Is there something else I should use?

{
  "description": "A bar chart showing the US population distribution of age groups in 2000.",
  "data": { "url": "data/population.json"},
  "mark": "bar",
  "encoding": {
    "y": {
      "field": "age", "type": "ordinal",
      "scale": { "padding": 0 }
    },
    "x": {
      "field": "year", "type": "ordinal",
       "scale": { "padding": 0 }
    },
    "color": {
      "aggregate": "mean", "field": "people", "type": "quantitative" 
    }
  }
}
willium commented 8 years ago

Taking a look at this! @domoritz, @kanitw It seems the bandSize and the multiplier of each formula's expr are getting the default padding (+1). By removing it from each, and then changing nothing with padding we get some thing SLIGHTLY better:

{
  "width": 1,
  "height": 1,
  "padding": "auto",
  "data": [
    {
      "name": "source",
      "url": "data/population.json",
      "format": {"type": "json","parse": {"people": "number"}},
      "transform": [{"type": "filter","test": "datum.people!==null"}]
    },
    {
      "name": "summary",
      "source": "source",
      "transform": [
        {
          "type": "aggregate",
          "groupby": ["year","age"],
          "summarize": {"people": ["mean"]}
        }
      ]
    },
    {
      "name": "layout",
      "source": "summary",
      "transform": [
        {
          "type": "aggregate",
          "summarize": [
            {"field": "year","ops": ["distinct"]},
            {"field": "age","ops": ["distinct"]}
          ]
        },
        {
          "type": "formula",
          "field": "cellWidth",
          "expr": "(datum.distinct_year + 1) * 20"
        },
        {
          "type": "formula",
          "field": "cellHeight",
          "expr": "(datum.distinct_age + 1) * 20"
        },
        {
          "type": "formula",
          "field": "width",
          "expr": "(datum.distinct_year + 1) * 20"
        },
        {
          "type": "formula",
          "field": "height",
          "expr": "(datum.distinct_age + 1) * 20"
        }
      ]
    }
  ],
  "marks": [
    {
      "name": "root",
      "type": "group",
      "description": "A bar chart showing the US population distribution of age groups in 2000.",
      "from": {"data": "layout"},
      "properties": {
        "update": {
          "width": {"field": "width"},
          "height": {"field": "height"}
        }
      },
      "marks": [
        {
          "type": "rect",
          "from": {"data": "summary"},
          "properties": {
            "update": {
              "xc": {"scale": "x","field": "year"},
              "width": {"value": 20},
              "yc": {"scale": "y","field": "age"},
              "height": {"value": 20},
              "fill": {"scale": "color","field": "mean_people"}
            }
          }
        }
      ],
      "scales": [
        {
          "name": "x",
          "type": "ordinal",
          "domain": {"data": "summary","field": "year","sort": true},
          "bandSize": 20,
          "round": true,
          "padding": 1,
          "points": true
        },
        {
          "name": "y",
          "type": "ordinal",
          "domain": {"data": "summary","field": "age","sort": true},
          "bandSize": 20,
          "round": true,
          "padding": 1,
          "points": true
        },
        {
          "name": "color",
          "type": "linear",
          "domain": {"data": "summary","field": "mean_people"},
          "range": ["#AFC6A3","#09622A"],
          "nice": false,
          "zero": false
        }
      ],
      "axes": [
        {
          "type": "x",
          "scale": "x",
          "grid": false,
          "ticks": 5,
          "title": "year",
          "properties": {
            "axis": {},
            "labels": {
              "text": {"template": "{{ datum.data | truncate:25}}"},
              "angle": {"value": 270},
              "align": {"value": "right"},
              "baseline": {"value": "middle"}
            }
          }
        },
        {
          "type": "y",
          "scale": "y",
          "grid": false,
          "title": "age",
          "properties": {
            "axis": {},
            "labels": {
              "text": {"template": "{{ datum.data | truncate:25}}"}
            }
          }
        }
      ],
      "legends": [
        {
          "fill": "color",
          "title": "MEAN(people)",
          "format": "s",
          "properties": {
            "symbols": {
              "shape": {"value": "square"},
              "strokeWidth": {"value": 0}
            }
          }
        }
      ]
    }
  ]
}

This suggests a greater problem with padding and what exactly it is. Making padding = 0 breaks the location of each rect w.r.t. the axis (layout issue).

It seems what we really want (and what I believe @aishfenton is looking for) is something more like this:

{
  "width": 1,
  "height": 1,
  "padding": "auto",
  "data": [
    {
      "name": "source",
      "url": "data/population.json",
      "format": {"type": "json","parse": {"people": "number"}},
      "transform": [{"type": "filter","test": "datum.people!==null"}]
    },
    {
      "name": "summary",
      "source": "source",
      "transform": [
        {
          "type": "aggregate",
          "groupby": ["year","age"],
          "summarize": {"people": ["mean"]}
        }
      ]
    },
    {
      "name": "layout",
      "source": "summary",
      "transform": [
        {
          "type": "aggregate",
          "summarize": [
            {"field": "year","ops": ["distinct"]},
            {"field": "age","ops": ["distinct"]}
          ]
        },
        {
          "type": "formula",
          "field": "cellWidth",
          "expr": "(datum.distinct_year + 1) * 19"
        },
        {
          "type": "formula",
          "field": "cellHeight",
          "expr": "(datum.distinct_age + 1) * 19"
        },
        {
          "type": "formula",
          "field": "width",
          "expr": "(datum.distinct_year + 1) * 19"
        },
        {
          "type": "formula",
          "field": "height",
          "expr": "(datum.distinct_age + 1) * 19"
        }
      ]
    }
  ],
  "marks": [
    {
      "name": "root",
      "type": "group",
      "description": "A bar chart showing the US population distribution of age groups in 2000.",
      "from": {"data": "layout"},
      "properties": {
        "update": {
          "width": {"field": "width"},
          "height": {"field": "height"}
        }
      },
      "marks": [
        {
          "type": "rect",
          "from": {"data": "summary"},
          "properties": {
            "update": {
              "xc": {"scale": "x","field": "year"},
              "width": {"value": 20},
              "yc": {"scale": "y","field": "age"},
              "height": {"value": 20},
              "fill": {"scale": "color","field": "mean_people"}
            }
          }
        }
      ],
      "scales": [
        {
          "name": "x",
          "type": "ordinal",
          "domain": {"data": "summary","field": "year","sort": true},
          "bandSize": 19,
          "round": true,
          "padding": 1,
          "points": true
        },
        {
          "name": "y",
          "type": "ordinal",
          "domain": {"data": "summary","field": "age","sort": true},
          "bandSize": 19,
          "round": true,
          "padding": 1,
          "points": true
        },
        {
          "name": "color",
          "type": "linear",
          "domain": {"data": "summary","field": "mean_people"},
          "range": ["#AFC6A3","#09622A"],
          "nice": false,
          "zero": false
        }
      ],
      "axes": [
        {
          "type": "x",
          "scale": "x",
          "grid": false,
          "ticks": 5,
          "title": "year",
          "properties": {
            "axis": {},
            "labels": {
              "text": {"template": "{{ datum.data | truncate:25}}"},
              "angle": {"value": 270},
              "align": {"value": "right"},
              "baseline": {"value": "middle"}
            }
          }
        },
        {
          "type": "y",
          "scale": "y",
          "grid": false,
          "title": "age",
          "properties": {
            "axis": {},
            "labels": {
              "text": {"template": "{{ datum.data | truncate:25}}"}
            }
          }
        }
      ],
      "legends": [
        {
          "fill": "color",
          "title": "MEAN(people)",
          "format": "s",
          "properties": {
            "symbols": {
              "shape": {"value": "square"},
              "strokeWidth": {"value": 0}
            }
          }
        }
      ]
    }
  ]
}

Or just increasing the height and width of the mark by 1 -- but I digress.

I believe this might hint at some greater issue in Vega -- but I suppose we COULD compile the Vega-lite to subtract the 'padding' from the bandSize (a bit of an odd implementation).

aishfenton commented 8 years ago

Yes @willium, the result from the second spec is indeed what I'm after. And I think having some kind of heatmap will be a common request.

kanitw commented 8 years ago

I agree that padding certainly requires revision.

The currently padding make sense for point mark, but not necessarily for bar mark.

The confusion is partly caused by the underlying D3, which have two types of ordinal scale with two padding modes. We currently always uses rangePoints scale rather than using rangeBands scale with bar for consistency. However, rangeBands will make padding behaves like what @aishfenton expects.

A straightforward solution is to add a way to customize the padding between bar, which is currently fixed to 1 in line 184 of bar.ts. I don't know what should this property be called. (Should it be padding?, but that's conflicting with existing padding.

In some sense padding in rangePoints should be called outerPadding.

willium commented 8 years ago

Perhaps margin is a better word anyway -- or gutter?

willium commented 8 years ago

In css land, outerPadding == margin

kanitw commented 8 years ago

That sounds about right, but inconsistent with D3/Vega. (In some sense, it seems like we find a better way to name d3's scale properties, but I'm leaning toward being consistent with them.)

kanitw commented 8 years ago

Note that this Vega bug is also relevant to this problem: https://github.com/vega/vega/issues/502.

willium commented 8 years ago

I'm also curious to see if the semantics of a heat map could be distanced from the "bar" mark too -- as discussed. It would not be my first guess, if I was trying to implement a heat map in Vega-lite. While troublesome, some sort of table seems more reasonable to me. What about the introduction of a "cell" mark (i.e. table-cell)?

kanitw commented 8 years ago

I'm also curious to see if the semantics of a heat map could be distanced from the "bar" mark too -- as discussed. It would not be my first guess, if I was trying to implement a heat map in Vega-lite. While troublesome, some sort of table seems more reasonable to me. What about the introduction of a "cell" mark (i.e. table-cell)?

@kanitw- let's forked this separate issue and discuss more in #1342

kanitw commented 8 years ago

Now that we have rect mark which use band-ordinal scale by default. Padding should behaves correctly.

kanitw commented 8 years ago

Visual Proof :)

vega_editor