Support `color` in ggplot2 pipeline

sharlagelfand commented 3 years ago

One of the limitations in ggplot2 pipeline parsing is that color isn't supported - shouldn't be an issue to add this to the plotting steps, since we include color in non-ggplot2 pipelines, but I just didn't parse it out from the ggplot2 object before!

This is one of the limitations in #86, but should be an easy enough fix.

sharlagelfand commented 3 years ago

@giorgi-ghviniashvili I'm working on this now, and realizing there's a case where we will want to have multiple variables in splitField, e.g. Work is on the x-axis, and Degree is colored, so to split by both. Is this possible? Here is a spec for the frame:

{
  "height": 300,
  "width": 300,
  "$schema": "https://vega.github.io/schema/vega-lite/v4.json",
  "meta": {
    "parse": "grid",
    "description": "Group by Work, Degree",
    "splitField": ["Work", "Degree"],
    "axes": false
  },
  "data": {
    "values": [
      {
        "Work": "Academia",
        "Degree": "Masters",
        "n": 10
      },
      {
        "Work": "Academia",
        "Degree": "PhD",
        "n": 18
      },
      {
        "Work": "Industry",
        "Degree": "Masters",
        "n": 62
      },
      {
        "Work": "Industry",
        "Degree": "PhD",
        "n": 10
      }
    ]
  },
  "mark": {
    "type": "point",
    "filled": true
  },
  "encoding": {
    "x": {
      "field": "datamations_x",
      "type": "quantitative",
      "axis": null
    },
    "y": {
      "field": "datamations_y",
      "type": "quantitative",
      "axis": null
    },
    "color": {
      "field": "Degree",
      "type": "nominal",
      "legend": {
        "values": ["Masters", "PhD"]
      }
    }
  }
}

giorgi-ghviniashvili commented 3 years ago

Hi @sharlagelfand,

Currently what splitField does is: it simply splits a group, by some field only once.

While technically it is possible to hierarchically split a single group further, you can achieve same thing with column facet = Work + splitField = Degree:

By hierarchically split, I mean that we need to know split sequence, which I can determine from array index. The first element gets precedence. This will also complicate grid generation logic, but it is definitely possible. So let me know if you need me to support splitField array.

sharlagelfand commented 3 years ago

@giorgi-ghviniashvili Thinking more about this, I don't think we need to have multiple fields in splitField, but rather just handle coloring the points properly. For example right now this spec (slightly different from above, with just splitField = Work)

{
  "height": 300,
  "width": 300,
  "$schema": "https://vega.github.io/schema/vega-lite/v4.json",
  "meta": {
    "parse": "grid",
    "description": "Group by Work, Degree",
    "splitField": "Work",
    "axes": false
  },
  "data": {
    "values": [
      {
        "Work": "Academia",
        "Degree": "Masters",
        "n": 10
      },
      {
        "Work": "Academia",
        "Degree": "PhD",
        "n": 18
      },
      {
        "Work": "Industry",
        "Degree": "Masters",
        "n": 62
      },
      {
        "Work": "Industry",
        "Degree": "PhD",
        "n": 10
      }
    ]
  },
  "mark": {
    "type": "point",
    "filled": true
  },
  "encoding": {
    "x": {
      "field": "datamations_x",
      "type": "quantitative",
      "axis": null
    },
    "y": {
      "field": "datamations_y",
      "type": "quantitative",
      "axis": null
    },
    "color": {
      "field": "Degree",
      "type": "nominal",
      "legend": {
        "values": ["Masters", "PhD"]
      }
    }
  }
}

ends up looking like this:

With the previous frame looking like this:

So when the orange points get colored they also "move over" and end up overlapping the blue points. What would be more desirable is if they just get colored, but don't move over to overlap.

I know we can accomplish something similar with column facet and splitfield, but this x + color case is for when the ggplot2 code specifies x and color, so we wouldn't want to move it over to facets.

giorgi-ghviniashvili commented 3 years ago

@sharlagelfand will this fix work?

If encoding.color, I am generating grid like this.

sharlagelfand commented 3 years ago

yeah @giorgi-ghviniashvili that looks great! thank you

microsoft / datamations

Support `color` in ggplot2 pipeline #87