spatialnetworkslab / vue-gg

A grammar of graphics, built on Vue's template syntax.
https://vue-gg.now.sh/
BSD 3-Clause "New" or "Revised" License
5 stars 0 forks source link

Multiple scales for marks #102

Closed bianchi-dy closed 5 years ago

bianchi-dy commented 5 years ago

Currently, scales are applied to marks on a 1-to-1 basis for x-y coordinates. For example:

        <vgg-map v-slot="{ row }">
          <vgg-multi-line
            :x="{ val: row.explanatory, scale: 'explanatory' }"
            :y="{ val: row.dependent, scale: 'dependent' }"
          />
        </vgg-map>

Where only one scale is applied to the entirety of row.explanatory.

However, for charts such as parallel coordinates and radar charts, each axis has its own scale, and thus each x or y coordinate needs to be scaled according to the dimension of the axes, i.e.

Parallel coordinates

such that a different scale applies to each point in the row. So a row of data might look like:

['apple', 100, 2 'b', 500]

which would then be scaled to coordinates internally (say, given a range of [0, 10]):

[1, 5, 2, 3, 6]

From what I can understand in this example, Vega applies multiple scales to a single mark by specifying the scales as part of an array:

"scales": [
    {
      "name": "ord", "type": "point",
      "range": "width", "round": true,
      "domain": {"data": "fields", "field": "data"}
    },
    {
      "name": "Cylinders", "type": "linear",
      "range": "height", "zero": false, "nice": true,
      "domain": {"data": "cars", "field": "Cylinders"}
    },
    {
      "name": "Displacement", "type": "linear",
      "range": "height", "zero": false, "nice": true,
      "domain": {"data": "cars", "field": "Displacement"}
    },
    // and so on
]

then it seems a given point's y in a data row is scaled according to the scale matching its index in the scales array. One caveat is that they only seem to support scaling for continuous domains at the moment. On the other hand, Vega-Lite requires the data to be transformed (window transform + fold transform appear to be the key transformations here).

  "transform": [
    {"window": [{"op": "count", "as": "index"}]},
    {"fold": ["petalLength", "petalWidth", "sepalLength", "sepalWidth"]},
    {
      "window": [
        {"op": "min", "field": "value", "as": "min"},
        {"op": "max", "field": "value", "as": "max"}
      ],
      "frame": [null, null],
      "groupby": ["key"]
    },
    {
      "calculate": "(datum.value - datum.min) / (datum.max-datum.min)",
      "as": "norm_val"
    },
    {
      "calculate": "(datum.min + datum.max) / 2",
      "as": "mid"
    }
  ],

There's an attempt at Parallel Coordinates living in the idcGraphs branch. Based on how much fingaling it took to implement manual scaling per axis, giving the necessary scales as an array and then enumerating the scales to the points seems a reasonable way to approach it in vue-gg, perhaps:

        <vgg-map v-slot="{ row }">
          <vgg-multi-line
            :x="{ val: row.dependent, scale: 'dependent' }"
            :y="{ val: row.explanatory, scale: ['Name', 'Price', 'WetWeight', 'RearWheelHorsepower', 'TopSpeed', 'MilesPG'] }"
          />
        </vgg-map>

The scale for the x-coordinate is related to y-axis positioning, so this is scaled to a range of [0, 1], such that the x coordinate matches where the axis with the relevant given dimension is. So for the same sample above:

x: [ <axis 1 loc>, <axis 2 loc>, <axis 3 loc>, <axis 4 loc>, <axis 5 loc> ] scales to [0, 0.2, 0.4, 0.6, 0.8] – this is scaled to range [0, 1], as that is the input for hjust in vgg-x-axis y: ['apple', 100, 2 'b', 500] could scale to [1, 5, 2, 3, 6] given a range of [0, 10] – this is one line in the chart. Each item in [1, 5, 2, 3, 6] refers to the y-coordinate of the line at a given axis.

Implementation-wise, I'm not sure if I'm missing any pros and cons with an enumeration approach or if there are better ways to carry out multiple scales (I'm also still studying Vega's implementation so I'll update this issue if anything interesting comes up). Any thoughts?

luucvanderzee commented 5 years ago

The parallel coordinate plot is a really confusing plot... I will first try to explain why I think it is so confusing and also what I think is the best solution to support it.

There are three main ways to map data to aesthetics. In the examples below I will use numbers to refer to row indices, and letters to refer to column indices.

  1. One row -> one mark. For each row, one mark is drawn. One row has a bunch of cells that each contain a single data value, and one mark has multiple aesthetics/props that accept a single aesthetic value. So
    { columnA: dataValueA, columnB: dataValueB, columnC: dataValueC } 
    -> 
    { aestheticA: aestheticValueA, aestheticB: aestheticValueB, aestheticC: aestheticValueC }

Example:

<vgg-point :x="row.a" :y="row.b" />
  1. One dataframe or one group -> one mark. In this case, one dataframe has multiple columns containing multiple data values, and a single mark has multiple aesthetics/props that take multiple aeshtetic values. So
    { 
    columnA: [ dataValueA1, dataValueA2, dataValueA3 ], 
    columnB: [ dataValueB1, dataValueB2, dataValueB3 ] 
    } 
    ->
    {
    aestheticA: [ aestheticValueA1, aestheticValueA2, aestheticValueA3 ],
    aestheticB: [ aestheticValueB1, aestheticValueB2, aestheticValueB3 ]
    }

Example:

<vgg-multi-line
  :x="dataframe.a"
  :y="dataframe.b"
/>

And then, the category that the parallel coordinate plot falls into:

  1. One row -> one mark, but one aesthetic/prop takes the values of the entire row, and another prop takes... the columns themselves as categories?
{ columnA: dataValueA, columnB: dataValueB, columnC: dataValueC }
->
{ 
  aestheticA: ['columnNameA', 'columnNameB', 'columnNameC'],
  aestheticB: [dataValueA, dataValueB, dataValueC]
} 

So what do we do with this? I was initially thinking of adding a new transformation called map, which would be like mutate in the sense that it would calculate a new column. So then you could do something like

<vgg-data
  :data="{ a: [1, 2, 3, 4], b: ['apple', 'apple', 'banana', 'banana'], c: [5, 6, 7, 8] }"
  :transform="{ map: {
    aScaled: { val: row => row.a, scale: { domain: 'a', range: [0, 2] } },
    bScaled: { val: row => row.b, scale: { domain: 'b', range: [0, 2] } }, 
    cScaled: { val: row => row.c, scale: { domain: 'c', range: [0, 2]  } }  
  } }"
>

  <vgg-section
    ...
    :scale-x="['a', 'b', 'c']"
    :scale-y="[0, 2]"
  >

    <vgg-map v-slot="{ row }">

      <vgg-multi-line
        :x="['a', 'b', 'c']"
        :y="[row.aScaled, row.bScaled, row.cScaled]"
      />

    </vgg-map> 

  </vgg-section>

</vgg-data>

Although I still kind of think the map transformation is fine to add to the library at some point, I actually like your approach, with the array of scaling options, better. But there are two problems with it. The first is that the explanation above (point 3) shows why the following code

:y="{ val: row.explanatory, scale: ['Name', 'Price', 'WetWeight', 'RearWheelHorsepower', 'TopSpeed', 'MilesPG'] }"

wouldn't be enough. Instead, it would have to be something like

<vgg-multi-line
  :x="{ 
    val: ['Name', 'Price', 'WetWeight', 'RearWheelHorsepower', 'TopSpeed', 'MilesPG'], 
    scale: ['Name', 'Price', 'WetWeight', 'RearWheelHorsepower', 'TopSpeed', 'MilesPG'] 
}"
  :y="{ 
    val: [row.Name, row.Price, row.WetWeight, row.RearWheelHorsePower, row.TopSpeed, row.MilesPG],
    scale: ['Name', 'Price', 'WetWeight', 'RearWheelHorsepower', 'TopSpeed', 'MilesPG'] 
  }"            
/>

The second problem is that right now, giving an array directly to scale is already used to manually specify a domain. So scale: ['Name', 'Price', ...] means that you are trying to create a single categorical scale (as we are doing in the :x prop!). But this problem could be solved by simply adding a new option called scales. So you would get

<vgg-multi-line
  :x="{ 
    val: ['Name', 'Price', 'WetWeight', 'RearWheelHorsepower', 'TopSpeed', 'MilesPG'], 
    scale: ['Name', 'Price', 'WetWeight', 'RearWheelHorsepower', 'TopSpeed', 'MilesPG'] 
}"
  :y="{ 
    val: [row.Name, row.Price, row.WetWeight, row.RearWheelHorsePower, row.TopSpeed, row.MilesPG],
    scales: ['Name', 'Price', 'WetWeight', 'RearWheelHorsepower', 'TopSpeed', 'MilesPG'] 
  }"            
/>

Which I think is pretty neat! The only possible objection is that you might not notice the difference between scale and scales if you quickread the code. But that might not be a real issue, and otherwise we could also pick something other than scales.

Positioning the axes would be simple if we would move the scale: ['Name', 'Price' ...] inside of the :x prop out to the vgg-section's :scale-x prop. Then you could position the axes with

<vgg-section
  ...
  :scale-x="['Name', 'Price', 'WetWeight', 'RearWheelHorsepower', 'TopSpeed', 'MilesPG']"
>

  <vgg-map v-slot="{ row }">
    ...
  </vgg-map>

  <vgg-x-axis
    v-for="column in ['Name', 'Price', 'WetWeight', 'RearWheelHorsepower', 'TopSpeed', 'MilesPG']"
    :x="column"
    :w="50"
    ...
  />

</vgg-section>

Doing that inline might be a little harder, and I am not immediately sure how we would do that. But this would work for now right? Shouldn't be too much work to implement either, just adding some logic to the mappings.js file I think. Thoughts?

bianchi-dy commented 5 years ago

@luucvanderzee I'm fine with the approach for positioning the y-axis along the x-axis using scale-x in in vgg-section since then it goes hand in hand with the library's general positioning and scaling logic, but I'm a little unclear on how the inputs to x and y get processed in the mark itself, e.g.

<vgg-multi-line
  :x="{ 
    val: ['Name', 'Price', 'WetWeight', 'RearWheelHorsepower', 'TopSpeed', 'MilesPG'], 
    scale: ['Name', 'Price', 'WetWeight', 'RearWheelHorsepower', 'TopSpeed', 'MilesPG'] 
}"
  :y="{ 
    val: [row.Name, row.Price, row.WetWeight, row.RearWheelHorsePower, row.TopSpeed, row.MilesPG],
    scales: ['Name', 'Price', 'WetWeight', 'RearWheelHorsepower', 'TopSpeed', 'MilesPG'] 
  }"            
/>

So if we use the ff in vgg-section:

:scale-x="['Name', 'Price', 'WetWeight', 'RearWheelHorsepower', 'TopSpeed', 'MilesPG']"

Then I think we'd no longer need scale: ['Name', 'Price', 'WetWeight', 'RearWheelHorsepower', 'TopSpeed', 'MilesPG'] in this bit:

:x="{ 
    val: ['Name', 'Price', 'WetWeight', 'RearWheelHorsepower', 'TopSpeed', 'MilesPG'], 
    scale: ['Name', 'Price', 'WetWeight', 'RearWheelHorsepower', 'TopSpeed', 'MilesPG'] 
}"

Since it already scales to scale-x. Is that correct?

As for

:y="{ 
    val: [row.Name, row.Price, row.WetWeight, row.RearWheelHorsePower, row.TopSpeed, row.MilesPG],
    scales: ['Name', 'Price', 'WetWeight', 'RearWheelHorsepower', 'TopSpeed', 'MilesPG'] 
  }"   

then I suppose for the enclosing vgg-section, there would be no scale-y. We can change the object key name to sth like scaleOrder to make it more distinct, etc. How difficult do you think would this be to implement?

luucvanderzee commented 5 years ago

@bianchi-dy

About your first point: yes, that is correct! You could decide whether you want to use the Section's :scale-x prop, or the inline version with { val: ... , scale: ... }. This is already supported behavior btw.

About the second point: again correct, the Section's :scale-y prop could not be used for this. As for the name:scaleOrder is already better than scales, but idk... maybe we can still brainstorm a bit about it. I don't expect this to be too hard to implement tbh. Do you need this feature urgently?

bianchi-dy commented 5 years ago

This has been somewhat resolved in the scale-transformation branch, but I'll run some tests for other data types to see if we've covered everything.

bianchi-dy commented 5 years ago

Resolved in #136