Prototype filtering - Githubissues

jhofman commented 2 years ago

@giorgi-ghviniashvili: related to #106, can you prototype what filtering out points would look like?

We know we want them to "disappear" or fade out, but it'll be interesting to explore how we annotate which points disappear and why.

for instance, on the salary data, imagine two different filtering operations:

Filtering to only people in academia: small_salary %>% filter(Work == "Academia")
Filtering to only people who make between $85k and $90k small_salary %>% filter(Salary >= 85, Salary <= 90)

One simple way to visualize things that could handle both of these would be to just have a grid for all of the points and a legend that has the filtering condition and either uses color or open/closed circles to indicate True/False on the condition, and then have the false ones fade away.

This seems like it would generalize pretty well, but maybe it leaves a bit to be desired as well. For instance w/ the filtering on salary itself, maybe you'd want to see a salary amounts visualized on the y axis and then the filtering applied? This is more intuitive, but is harder to generalize.

And then there are cases that combine filtering on different variables, like steps 1 and 2 above at the same time. That starts to get tricky unless you just do the true / false legend version, right?

giorgi-ghviniashvili commented 2 years ago

@jhofman here is how I achieved it. Legend is a bit funky, just added title to the legend:

https://user-images.githubusercontent.com/6615532/140475015-a0e3657d-b1af-4672-8f96-039706d34c3e.mov

@sharlagelfand notice that I added meta to each data value:

It was needed to add Work on grid generation phase. So in Masters, Academia = 28 and Industry = 20.

jhofman commented 2 years ago

This looks great so far, @sharlagelfand points out that we should be careful about empty circles representing properties of the data vs. data transformations, so let's think about this a bit more.

I'd propose is that we mimic the gemini example and don't do any special visual indication of the points to be filtered, but instead just make points fade out and have the title reflect what filtering happened.

@giorgi-ghviniashvili: can you prototype this version?

giorgi-ghviniashvili commented 2 years ago

@jhofman alright. I removed intermediate spec of filled-vs-nonfilled circles. And just fading out circles:

https://user-images.githubusercontent.com/6615532/140715176-bf1b3035-c047-4830-8f8a-9163e3da8b1b.mov

New issue is that, because the filtered circles faded out and transform.filter executes by vega after grid generation, the inner grids are not centered aligned to x axis labels.

We can solve this by:

1) parse transform and its filter expression and calculate filtered n value after the filter and then generate the grid. This approach is difficult and requires to write the expression parser..

2) set n to filtered value directly in the spec, instead of real values. This is easiest and 0 code changes:

3) ignore grid generation on frontend and send generated grid from backend when using filters?

Which one do you prefer?

sharlagelfand commented 2 years ago

I think that we would always treat the filter as a separate step, so e.g. could be initial data > filter > group, in which case the filter would happen on the initial grid, then the points would be grouped (and centered) so that's fine.

Or alternatively if it's initial data > group > filter then the points would be grouped first, then filtered out, and would look how they do in the final frame of the animation (i.e. not centered), which is also fine because it illustrates easily that the difference between the last two frames is just the filtering / fading out of those points.

jhofman commented 2 years ago

Agreed @sharlagelfand, sounds good to treat them as separate steps and keep them modular.

giorgi-ghviniashvili commented 2 years ago

ok, then we just need to include transform.filter in filter spec and don't forget meta object in data.values.

jhofman commented 2 years ago

one concern is that we don't want to have to translate R filter commands to vegalite-compatible transform.filter commands. a solution here would be to just have R pass a True/False indicator for each point as to whether it gets filtered out or not (or equivalently a list of the points that should be filtered out).

we could solve this by passing some original "id" for each row (maybe called "row_num" so as not to conflict w/ giorgi's ids for grid generation) through the specs, but this might be overkill.

let's sketch out how these specs could look both with and without R passing ids over to vegalite, and decide from there.

giorgi-ghviniashvili commented 2 years ago

Option 1 (let vega-lite filter using transform.filter. Passing `meta` fields to know which datapoint is Academia and which Industry):

{
    "height": 300,
    "width": 300,
    "$schema": "https://vega.github.io/schema/vega-lite/v4.json",
    "meta": {
      "parse": "grid",
      "description": "Filter Work == Academia, group by Degree.",
      "splitField": "Degree",
      "axes": false
    },
    "data": {
      "values": [
        {
          "Degree": "Masters",
          "n": 8,
          "meta": {
            "Work": {
              "Academia": 3,
              "Industry": 5,
            }
          }
        },
        {
          "Degree": "PhD",
          "n": 10,
          "meta": {
            "Work": {
              "Academia": 7,
              "Industry": 3,
            }
          }
        }
      ]
    },

    "transform": [
       { 
         "filter": "datum.Work == 'Academia'" 
       }
     ],

    "mark": {
      "type": "point",
      "filled": true,
    },
    "encoding": {
      "x": {
        "field": "datamations_x",
        "type": "quantitative",
        "axis": null
      },
      "y": {
        "field": "datamations_y",
        "type": "quantitative",
        "axis": null
      }
    }
  }

Option 2 (passing array of 0s and 1s with `data.values`):

{
    "height": 300,
    "width": 300,
    "$schema": "https://vega.github.io/schema/vega-lite/v4.json",
    "meta": {
      "parse": "grid",
      "description": "Filter Work == Academia, group by Degree.",
      "splitField": "Degree",
      "axes": false
    },
    "data": {
      "values": [
        {
          "Degree": "Masters",
          "n": 8,
          "filter_arr": [1, 1, 1, 0, 0, 0, 0, 0]
        },
        {
          "Degree": "PhD",
          "n": 10,
          "filter_arr": [1, 1, 1, 1, 1, 1, 1, 0, 0, 0],
        }
      ]
    },

    "transform": [
       { 
         "filter": "datum.filter_arr == 1" 
       }
     ],

    "mark": {
      "type": "point",
      "filled": true,
    },
    "encoding": {
      "x": {
        "field": "datamations_x",
        "type": "quantitative",
        "axis": null
      },
      "y": {
        "field": "datamations_y",
        "type": "quantitative",
        "axis": null
      }
    }
  }

sharlagelfand commented 2 years ago

Here is the third option @jhofman suggested, where we pass IDs and then filter based on the IDs:

{
    "height": 300,
    "width": 300,
    "$schema": "https://vega.github.io/schema/vega-lite/v4.json",
    "meta": {
      "parse": "grid",
      "description": "Filter Work == Academia, group by Degree.",
      "splitField": "Degree",
      "axes": false
    },
    "data": {
      "values": [
        {
          "Degree": "Masters",
          "n": 8,
          "ids": [1, 2, 3, 4, 5, 6, 7, 8]
        },
        {
          "Degree": "PhD",
          "n": 5,
          "ids": [9, 10, 11, 12, 13, 14]
        }
      ]
    },

    "transform": [
       { 
         "filter": {"field": "ids", "oneOf": [1, 3, 6, 10, 14]}
       }
     ],

    "mark": {
      "type": "point",
      "filled": true,
    },
    "encoding": {
      "x": {
        "field": "datamations_x",
        "type": "quantitative",
        "axis": null
      },
      "y": {
        "field": "datamations_y",
        "type": "quantitative",
        "axis": null
      }
    }
  }

jhofman commented 2 years ago

Let's go with Option 3, it's equivalent to 2 but more general.

Not clear if we should overwrite the grid generation ids with these ids or not. There could be some "out of order" problems w/ multiple group-bys. @sharlagelfand and @giorgi-ghviniashvili will compare what these would look like and if they would match up.

jhofman commented 2 years ago

@giorgi-ghviniashvili: @sharlagelfand has filtering working at the end of pipelines (e.g., after the summarize), but looks like some coordination between you both needs to happen to get filtering to work earlier in the pipeline in terms of matching ids between frames.

can you follow up on this to make some progress before thursday?

sharlagelfand commented 2 years ago

thanks @jhofman! @giorgi-ghviniashvili, I will post an update / question here on what we need to coordinate on in a few hours.

sharlagelfand commented 2 years ago

From what I can tell, the IDs that I am generating match the IDs generated by the info grid generation on @giorgi-ghviniashvili's side.

Filtering does working in some initial test cases without any modification of things on the JS side 🎉 Here is the some progress of where things are at:

filter after initial data

"small_salary %>%
  filter(Salary > 90) %>%
  group_by(Degree)" %>%
  datamation_sanddance()

https://user-images.githubusercontent.com/15895337/142066503-8454be43-6e8f-454b-8fa2-17c56e56135c.mov

filter after / within group_by

"small_salary %>%
  group_by(Degree) %>%
  filter(abs(mean(Salary) - Salary) > 5) %>%
  summarise(mean = mean(Salary))" %>%
  datamation_sanddance()

https://user-images.githubusercontent.com/15895337/142066521-97aecf3a-5752-4fa7-a00e-98f4b0296c6c.mov

filter after summarize

"small_salary %>%
  group_by(Degree) %>%
  summarise(median = median(Salary)) %>%
  filter(median > 90)" %>%
  datamation_sanddance()

https://user-images.githubusercontent.com/15895337/142066536-951f3e7f-eb7d-4483-b166-15d30460d5b8.mov

But there are some issues, e.g. when trying to filter with > 1 grouping variable:

filtering after group_by with multiple grouping variables

df <- small_salary %>%
  group_by(Degree, Work) %>% slice(1:2)

"df %>%
  group_by(Degree, Work) %>%
  filter(Salary == max(Salary))" %>%
  datamation_sanddance()

you can see nothing is filtered out in the last frame, but it should be

here are the specs that are being passed:

<details

{
  "$schema": "https://vega.github.io/schema/vega-lite/v4.json",
  "meta": {
    "parse": "grid",
    "description": "Filter Salary == max(Salary) within each group",
    "splitField": "Work",
    "axes": true
  },
  "data": {
    "values": [
      {
        "Degree": "Masters",
        "Work": "Academia",
        "n": 2,
        "gemini_ids": [1, 2]
      },
      {
        "Degree": "Masters",
        "Work": "Industry",
        "n": 2,
        "gemini_ids": [3, 4]
      },
      {
        "Degree": "PhD",
        "Work": "Academia",
        "n": 2,
        "gemini_ids": [5, 6]
      },
      {
        "Degree": "PhD",
        "Work": "Industry",
        "n": 2,
        "gemini_ids": [7, 8]
      }
    ]
  },
  "facet": {
    "column": {
      "field": "Degree",
      "type": "ordinal",
      "title": "Degree"
    }
  },
  "spec": {
    "height": 300,
    "width": 150,
    "mark": {
      "type": "point",
      "filled": true,
      "strokeWidth": 1
    },
    "encoding": {
      "x": {
        "field": "datamations_x",
        "type": "quantitative",
        "axis": null
      },
      "y": {
        "field": "datamations_y",
        "type": "quantitative",
        "axis": null
      },
      "color": {
        "field": "Work",
        "type": "nominal",
        "legend": {
          "values": ["Academia", "Industry"]
        }
      },
      "tooltip": [
        {
          "field": "Degree",
          "type": "nominal"
        },
        {
          "field": "Work",
          "type": "nominal"
        }
      ]
    }
  },
  "transform": [
    {
      "filter": {
        "field": "gemini_id",
        "oneOf": [2, 4, 5, 8]
      }
    }
  ]
}

and the specs produced:

```json { "$schema": "https://vega.github.io/schema/vega-lite/v4.json", "data": { "values": [ { "Degree": "Masters", "Work": "Academia", "n": 2, "gemini_ids": [ 1, 2 ], "gemini_id": 1, "datamations_x": 50, "datamations_y": 100.00000000000001 }, { "Degree": "Masters", "Work": "Academia", "n": 2, "gemini_ids": [ 1, 2 ], "gemini_id": 2, "datamations_x": 50, "datamations_y": 200.00000000000003 }, { "Degree": "Masters", "Work": "Industry", "n": 2, "gemini_ids": [ 3, 4 ], "gemini_id": 3, "datamations_x": 100, "datamations_y": 100.00000000000001 }, { "Degree": "Masters", "Work": "Industry", "n": 2, "gemini_ids": [ 3, 4 ], "gemini_id": 4, "datamations_x": 100, "datamations_y": 200.00000000000003 }, { "Degree": "PhD", "Work": "Academia", "n": 2, "gemini_ids": [ 5, 6 ], "gemini_id": 5, "datamations_x": 222, "datamations_y": 100.00000000000001 }, { "Degree": "PhD", "Work": "Academia", "n": 2, "gemini_ids": [ 5, 6 ], "gemini_id": 6, "datamations_x": 222, "datamations_y": 200.00000000000003 }, { "Degree": "PhD", "Work": "Industry", "n": 2, "gemini_ids": [ 7, 8 ], "gemini_id": 7, "datamations_x": 272, "datamations_y": 100.00000000000001 }, { "Degree": "PhD", "Work": "Industry", "n": 2, "gemini_ids": [ 7, 8 ], "gemini_id": 8, "datamations_x": 272, "datamations_y": 200.00000000000003 } ] }, "width": 322, "height": 300, "mark": { "type": "point", "filled": true, "strokeWidth": 1 }, "encoding": { "x": { "field": "datamations_x", "type": "quantitative", "scale": { "domain": [ 0, 322 ] }, "axis": { "labelExpr": "{\"75\":\"Masters\",\"247\":\"PhD\"}[datum.label]", "values": [ 75, 247 ], "title": "Degree", "grid": false, "orient": "top", "ticks": false, "domain": false, "labelPadding": 10 } }, "y": { "field": "datamations_y", "type": "quantitative", "scale": { "domain": [ 300, 0 ] }, "axis": null }, "color": { "field": "Work", "type": "nominal", "legend": { "values": [ "Academia", "Industry" ] } }, "tooltip": [ { "field": "Degree", "type": "nominal" }, { "field": "Work", "type": "nominal" } ] }, "meta": { "parse": "grid", "description": "Filter Salary == max(Salary) within each group", "splitField": "Work", "axes": true, "transformX": 1, "transformY": 0 } } ```

for some reason the transform field is dropped in the specs produced by the JS, but it is just fine in the previous examples - @giorgi-ghviniashvili could you please take a look?

and some other buggy issues, e.g. when an entire group is filtered out

filtering out an entire group

"small_salary %>%
  group_by(Degree, Work) %>%
  filter(Degree == 'Masters')" %>%
  datamation_sanddance()

https://user-images.githubusercontent.com/15895337/142068044-8152394c-dfaf-4237-b029-d2b7cd57cbef.mov

for this example, I will need to figure out what is going wrong. i will dig into that one a bit more tomorrow.

giorgi-ghviniashvili commented 2 years ago

@sharlagelfand hackFacet was completely ignoring transform fields. I just included and it works now:

sharlagelfand commented 2 years ago

great, thanks!

sharlagelfand commented 2 years ago

The behaviour of vega lite when a facet has no values (filtered out via filter.transform) is to remove the facet all together - just want to confirm that this is behaviour we like the look of (otherwise I can investigate if there is a way to explicitly set the domain of a facet so the facets are retained):

"small_salary %>%
  group_by(Degree, Work) %>%
  filter(Degree == 'Masters')" %>%
  datamation_sanddance()

https://user-images.githubusercontent.com/15895337/142433243-6f014ed1-d931-47ce-96f9-bd3b96441465.mov

jhofman commented 2 years ago

Sounds like we're not sure what to do here. On one hand if there are two groups and you filter on one, seems okay to leave the other facet there but empty. On the other hand, if you had many groups and limited to just one, you'd probably want to drop the others.

Seems related to the idea of whether you keep around unused levels of a factor when plotting. Perhaps we can take inspiration from ggplot2 defaults on this?

At the moment I'd be okay with the way things currently work, with the empty facet getting dropped.

sharlagelfand commented 2 years ago

@jhofman I think your point about looking to the actual data state (in the context of count, but applies here) might be a good way to direct us - e.g. filtering after the summarise results in a single row df

small_salary %>%
  group_by(Degree) %>%
  summarise(median = median(Salary)) %>%
  filter(median > 90)

# # A tibble: 1 × 2
#  Degree  median
#  <chr>    <dbl>
# 1 Masters   91.1

so maybe the empty x-axis value should get dropped, too, e.g. leaning more into the "empty value is dropped" that's present with the faceting.

jhofman commented 2 years ago

agreed, we'll drop facets that get filtered out and we'll mirror this for x axis values that get dropped as well.

see #119 for an enhancement some day where this behavior could be overridden with "visual options".

sharlagelfand commented 2 years ago

@giorgi-ghviniashvili I think there needs to be some additions on the JS side to properly support this "drop the x-axis / facet value" feature in filtering in info grids, since I am not controlling the values.

If the first x-axis value is filtered out, the spec seems pretty mis-aligned:

"small_salary %>% 
  group_by(Degree, Work) %>% 
  filter(Work == 'Academia')" %>% 
  datamation_sanddance()

Raw spec:

```json { "$schema": "https://vega.github.io/schema/vega-lite/v4.json", "meta": { "parse": "grid", "description": "Filter Work == \"Academia\" within each group", "splitField": "Work", "axes": true }, "data": { "values": [ { "Degree": "Masters", "Work": "Academia", "n": 10, "gemini_ids": [1, 2, 3, 4, 5, 6, 7, 8, 9, 10] }, { "Degree": "Masters", "Work": "Industry", "n": 62, "gemini_ids": [11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72] }, { "Degree": "PhD", "Work": "Academia", "n": 18, "gemini_ids": [73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90] }, { "Degree": "PhD", "Work": "Industry", "n": 10, "gemini_ids": [91, 92, 93, 94, 95, 96, 97, 98, 99, 100] } ] }, "facet": { "column": { "field": "Degree", "type": "ordinal", "title": "Degree" } }, "spec": { "height": 300, "width": 150, "mark": { "type": "point", "filled": true, "strokeWidth": 1 }, "encoding": { "x": { "field": "datamations_x", "type": "quantitative", "axis": null }, "y": { "field": "datamations_y", "type": "quantitative", "axis": null }, "color": { "field": "Work", "type": "nominal", "legend": { "values": ["Academia", "Industry"] } }, "tooltip": [ { "field": "Degree", "type": "nominal" }, { "field": "Work", "type": "nominal" } ] } }, "transform": [ { "filter": { "field": "gemini_id", "oneOf": [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90] } } ] } ```

If the second x-axis value is filtered out, the colour disappears in the last frame?

"small_salary %>% 
  group_by(Degree, Work) %>% 
  filter(Work == 'Industry')" %>% 
  datamation_sanddance()

Raw spec:

```json { "$schema": "https://vega.github.io/schema/vega-lite/v4.json", "meta": { "parse": "grid", "description": "Filter Work == \"Industry\" within each group", "splitField": "Work", "axes": true }, "data": { "values": [ { "Degree": "Masters", "Work": "Academia", "n": 10, "gemini_ids": [1, 2, 3, 4, 5, 6, 7, 8, 9, 10] }, { "Degree": "Masters", "Work": "Industry", "n": 62, "gemini_ids": [11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72] }, { "Degree": "PhD", "Work": "Academia", "n": 18, "gemini_ids": [73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90] }, { "Degree": "PhD", "Work": "Industry", "n": 10, "gemini_ids": [91, 92, 93, 94, 95, 96, 97, 98, 99, 100] } ] }, "facet": { "column": { "field": "Degree", "type": "ordinal", "title": "Degree" } }, "spec": { "height": 300, "width": 150, "mark": { "type": "point", "filled": true, "strokeWidth": 1 }, "encoding": { "x": { "field": "datamations_x", "type": "quantitative", "axis": null }, "y": { "field": "datamations_y", "type": "quantitative", "axis": null }, "color": { "field": "Work", "type": "nominal", "legend": { "values": ["Academia", "Industry"] } }, "tooltip": [ { "field": "Degree", "type": "nominal" }, { "field": "Work", "type": "nominal" } ] } }, "transform": [ { "filter": { "field": "gemini_id", "oneOf": [11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100] } } ] } ```

If both x-axis values are filtered out, can you update encoding.x.axis.values = []?

"small_salary %>% 
  group_by(Work) %>% 
  filter(Work == 'Bachelors')" %>% 
  datamation_sanddance()

Raw specs:

```json { "$schema": "https://vega.github.io/schema/vega-lite/v4.json", "meta": { "parse": "grid", "description": "Filter Work == \"Bachelors\" within each group", "splitField": "Work", "axes": true }, "data": { "values": [ { "Degree": "Masters", "Work": "Academia", "n": 10, "gemini_ids": [1, 2, 3, 4, 5, 6, 7, 8, 9, 10] }, { "Degree": "Masters", "Work": "Industry", "n": 62, "gemini_ids": [11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72] }, { "Degree": "PhD", "Work": "Academia", "n": 18, "gemini_ids": [73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90] }, { "Degree": "PhD", "Work": "Industry", "n": 10, "gemini_ids": [91, 92, 93, 94, 95, 96, 97, 98, 99, 100] } ] }, "facet": { "column": { "field": "Degree", "type": "ordinal", "title": "Degree" } }, "spec": { "height": 300, "width": 150, "mark": { "type": "point", "filled": true, "strokeWidth": 1 }, "encoding": { "x": { "field": "datamations_x", "type": "quantitative", "axis": null }, "y": { "field": "datamations_y", "type": "quantitative", "axis": null }, "color": { "field": "Work", "type": "nominal", "legend": { "values": ["Academia", "Industry"] } }, "tooltip": [ { "field": "Degree", "type": "nominal" }, { "field": "Work", "type": "nominal" } ] } }, "transform": [ { "filter": { "field": "gemini_id", "oneOf": [] } } ] } ```

The empty values will make it look like this:

instead of like this:

(This is what I am doing in the summarize step if both are dropped, so to be consistent!)

There is also an issue when all the facets are filtered out:

"small_salary %>% 
  group_by(Degree, Work) %>% 
  filter(Degree == 'Bachelors')" %>% 
  datamation_sanddance()

Raw specs:

```json { "$schema": "https://vega.github.io/schema/vega-lite/v4.json", "meta": { "parse": "grid", "description": "Filter Degree == \"Bachelors\" within each group", "splitField": "Work", "axes": true }, "data": { "values": [ { "Degree": "Masters", "Work": "Academia", "n": 10, "gemini_ids": [1, 2, 3, 4, 5, 6, 7, 8, 9, 10] }, { "Degree": "Masters", "Work": "Industry", "n": 62, "gemini_ids": [11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72] }, { "Degree": "PhD", "Work": "Academia", "n": 18, "gemini_ids": [73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90] }, { "Degree": "PhD", "Work": "Industry", "n": 10, "gemini_ids": [91, 92, 93, 94, 95, 96, 97, 98, 99, 100] } ] }, "facet": { "column": { "field": "Degree", "type": "ordinal", "title": "Degree" } }, "spec": { "height": 300, "width": 150, "mark": { "type": "point", "filled": true, "strokeWidth": 1 }, "encoding": { "x": { "field": "datamations_x", "type": "quantitative", "axis": null }, "y": { "field": "datamations_y", "type": "quantitative", "axis": null }, "color": { "field": "Work", "type": "nominal", "legend": { "values": ["Academia", "Industry"] } }, "tooltip": [ { "field": "Degree", "type": "nominal" }, { "field": "Work", "type": "nominal" } ] } }, "transform": [ { "filter": { "field": "gemini_id", "oneOf": [] } } ] } ```

This produces an error in the console:

TypeError: column_header is undefined

Thanks!

jhofman commented 2 years ago

@giorgi-ghviniashvili will take a look at the above for next time

giorgi-ghviniashvili commented 2 years ago

Hey @sharlagelfand and @jhofman

seems like the misalignment is fixed in the new PR:

yes, after filtering colors disappears, to keep them, we need to explicitly set scale.domain:

"color": {
"field": "Work",
"type": "nominal",
"legend": {
  "values": ["Academia", "Industry"]
},
"scale": {
  "domain": ["Academia", "Industry"],
  "range": ["rgba(76, 120, 168, 0.7)", "rgba(245, 133, 24, 0.7)"]
}
}

when there is empty filter, everything was messed up, was having an error in the console and nothing was drawn.

  "transform": [
    {
      "filter": {
        "field": "gemini_id",
        "oneOf": []
      }
    }
  ]

To solve this, I generate empty spec, and avoid any processing:

Which produces this:

Note, that I used splitField for x axis title, if you want a different thing in a different occasions, we might need to put a title in meta for this?

giorgi-ghviniashvili commented 2 years ago

Seems like only scale.domain works, without color range.

"scale": {
  "domain": ["Academia", "Industry"]
}

microsoft / datamations

Prototype filtering #107

Option 1 (let vega-lite filter using transform.filter. Passing `meta` fields to know which datapoint is Academia and which Industry):

Option 2 (passing array of 0s and 1s with `data.values`):

filter after initial data

filter after / within group_by

filter after summarize

filtering after group_by with multiple grouping variables

filtering out an entire group

microsoft / datamations

Prototype filtering #107

Option 1 (let vega-lite filter using transform.filter. Passing meta fields to know which datapoint is Academia and which Industry):

Option 2 (passing array of 0s and 1s with data.values):

filter after initial data

filter after / within group_by

filter after summarize

filtering after group_by with multiple grouping variables

filtering out an entire group

Option 1 (let vega-lite filter using transform.filter. Passing `meta` fields to know which datapoint is Academia and which Industry):

Option 2 (passing array of 0s and 1s with `data.values`):