Closed jhofman closed 2 years ago
@jhofman here is how I achieved it. Legend is a bit funky, just added title to the legend:
https://user-images.githubusercontent.com/6615532/140475015-a0e3657d-b1af-4672-8f96-039706d34c3e.mov
@sharlagelfand notice that I added meta
to each data value:
It was needed to add Work
on grid generation phase. So in Masters, Academia = 28 and Industry = 20.
This looks great so far, @sharlagelfand points out that we should be careful about empty circles representing properties of the data vs. data transformations, so let's think about this a bit more.
I'd propose is that we mimic the gemini example and don't do any special visual indication of the points to be filtered, but instead just make points fade out and have the title reflect what filtering happened.
@giorgi-ghviniashvili: can you prototype this version?
@jhofman alright. I removed intermediate spec of filled-vs-nonfilled
circles. And just fading out circles:
https://user-images.githubusercontent.com/6615532/140715176-bf1b3035-c047-4830-8f8a-9163e3da8b1b.mov
New issue is that, because the filtered circles faded out and transform.filter
executes by vega after grid generation, the inner grids are not centered aligned to x axis labels.
We can solve this by:
1) parse transform
and its filter
expression and calculate filtered n
value after the filter and then generate the grid. This approach is difficult and requires to write the expression parser..
2) set n
to filtered value directly in the spec, instead of real values. This is easiest and 0 code changes:
3) ignore grid generation on frontend and send generated grid from backend when using filters?
Which one do you prefer?
I think that we would always treat the filter as a separate step, so e.g. could be initial data > filter > group
, in which case the filter would happen on the initial grid, then the points would be grouped (and centered) so that's fine.
Or alternatively if it's initial data > group > filter
then the points would be grouped first, then filtered out, and would look how they do in the final frame of the animation (i.e. not centered), which is also fine because it illustrates easily that the difference between the last two frames is just the filtering / fading out of those points.
Agreed @sharlagelfand, sounds good to treat them as separate steps and keep them modular.
ok, then we just need to include transform.filter
in filter spec and don't forget meta
object in data.values.
one concern is that we don't want to have to translate R filter commands to vegalite-compatible transform.filter commands. a solution here would be to just have R pass a True/False indicator for each point as to whether it gets filtered out or not (or equivalently a list of the points that should be filtered out).
we could solve this by passing some original "id" for each row (maybe called "row_num" so as not to conflict w/ giorgi's ids for grid generation) through the specs, but this might be overkill.
let's sketch out how these specs could look both with and without R passing ids over to vegalite, and decide from there.
meta
fields to know which datapoint is Academia and which Industry):{
"height": 300,
"width": 300,
"$schema": "https://vega.github.io/schema/vega-lite/v4.json",
"meta": {
"parse": "grid",
"description": "Filter Work == Academia, group by Degree.",
"splitField": "Degree",
"axes": false
},
"data": {
"values": [
{
"Degree": "Masters",
"n": 8,
"meta": {
"Work": {
"Academia": 3,
"Industry": 5,
}
}
},
{
"Degree": "PhD",
"n": 10,
"meta": {
"Work": {
"Academia": 7,
"Industry": 3,
}
}
}
]
},
"transform": [
{
"filter": "datum.Work == 'Academia'"
}
],
"mark": {
"type": "point",
"filled": true,
},
"encoding": {
"x": {
"field": "datamations_x",
"type": "quantitative",
"axis": null
},
"y": {
"field": "datamations_y",
"type": "quantitative",
"axis": null
}
}
}
data.values
):{
"height": 300,
"width": 300,
"$schema": "https://vega.github.io/schema/vega-lite/v4.json",
"meta": {
"parse": "grid",
"description": "Filter Work == Academia, group by Degree.",
"splitField": "Degree",
"axes": false
},
"data": {
"values": [
{
"Degree": "Masters",
"n": 8,
"filter_arr": [1, 1, 1, 0, 0, 0, 0, 0]
},
{
"Degree": "PhD",
"n": 10,
"filter_arr": [1, 1, 1, 1, 1, 1, 1, 0, 0, 0],
}
]
},
"transform": [
{
"filter": "datum.filter_arr == 1"
}
],
"mark": {
"type": "point",
"filled": true,
},
"encoding": {
"x": {
"field": "datamations_x",
"type": "quantitative",
"axis": null
},
"y": {
"field": "datamations_y",
"type": "quantitative",
"axis": null
}
}
}
Here is the third option @jhofman suggested, where we pass IDs and then filter based on the IDs:
{
"height": 300,
"width": 300,
"$schema": "https://vega.github.io/schema/vega-lite/v4.json",
"meta": {
"parse": "grid",
"description": "Filter Work == Academia, group by Degree.",
"splitField": "Degree",
"axes": false
},
"data": {
"values": [
{
"Degree": "Masters",
"n": 8,
"ids": [1, 2, 3, 4, 5, 6, 7, 8]
},
{
"Degree": "PhD",
"n": 5,
"ids": [9, 10, 11, 12, 13, 14]
}
]
},
"transform": [
{
"filter": {"field": "ids", "oneOf": [1, 3, 6, 10, 14]}
}
],
"mark": {
"type": "point",
"filled": true,
},
"encoding": {
"x": {
"field": "datamations_x",
"type": "quantitative",
"axis": null
},
"y": {
"field": "datamations_y",
"type": "quantitative",
"axis": null
}
}
}
Let's go with Option 3, it's equivalent to 2 but more general.
Not clear if we should overwrite the grid generation ids with these ids or not. There could be some "out of order" problems w/ multiple group-bys. @sharlagelfand and @giorgi-ghviniashvili will compare what these would look like and if they would match up.
@giorgi-ghviniashvili: @sharlagelfand has filtering working at the end of pipelines (e.g., after the summarize), but looks like some coordination between you both needs to happen to get filtering to work earlier in the pipeline in terms of matching ids between frames.
can you follow up on this to make some progress before thursday?
thanks @jhofman! @giorgi-ghviniashvili, I will post an update / question here on what we need to coordinate on in a few hours.
From what I can tell, the IDs that I am generating match the IDs generated by the info grid generation on @giorgi-ghviniashvili's side.
Filtering does working in some initial test cases without any modification of things on the JS side 🎉 Here is the some progress of where things are at:
"small_salary %>%
filter(Salary > 90) %>%
group_by(Degree)" %>%
datamation_sanddance()
"small_salary %>%
group_by(Degree) %>%
filter(abs(mean(Salary) - Salary) > 5) %>%
summarise(mean = mean(Salary))" %>%
datamation_sanddance()
"small_salary %>%
group_by(Degree) %>%
summarise(median = median(Salary)) %>%
filter(median > 90)" %>%
datamation_sanddance()
But there are some issues, e.g. when trying to filter with > 1 grouping variable:
df <- small_salary %>%
group_by(Degree, Work) %>% slice(1:2)
"df %>%
group_by(Degree, Work) %>%
filter(Salary == max(Salary))" %>%
datamation_sanddance()
you can see nothing is filtered out in the last frame, but it should be
here are the specs that are being passed:
<details
{
"$schema": "https://vega.github.io/schema/vega-lite/v4.json",
"meta": {
"parse": "grid",
"description": "Filter Salary == max(Salary) within each group",
"splitField": "Work",
"axes": true
},
"data": {
"values": [
{
"Degree": "Masters",
"Work": "Academia",
"n": 2,
"gemini_ids": [1, 2]
},
{
"Degree": "Masters",
"Work": "Industry",
"n": 2,
"gemini_ids": [3, 4]
},
{
"Degree": "PhD",
"Work": "Academia",
"n": 2,
"gemini_ids": [5, 6]
},
{
"Degree": "PhD",
"Work": "Industry",
"n": 2,
"gemini_ids": [7, 8]
}
]
},
"facet": {
"column": {
"field": "Degree",
"type": "ordinal",
"title": "Degree"
}
},
"spec": {
"height": 300,
"width": 150,
"mark": {
"type": "point",
"filled": true,
"strokeWidth": 1
},
"encoding": {
"x": {
"field": "datamations_x",
"type": "quantitative",
"axis": null
},
"y": {
"field": "datamations_y",
"type": "quantitative",
"axis": null
},
"color": {
"field": "Work",
"type": "nominal",
"legend": {
"values": ["Academia", "Industry"]
}
},
"tooltip": [
{
"field": "Degree",
"type": "nominal"
},
{
"field": "Work",
"type": "nominal"
}
]
}
},
"transform": [
{
"filter": {
"field": "gemini_id",
"oneOf": [2, 4, 5, 8]
}
}
]
}
and the specs produced:
for some reason the transform
field is dropped in the specs produced by the JS, but it is just fine in the previous examples - @giorgi-ghviniashvili could you please take a look?
and some other buggy issues, e.g. when an entire group is filtered out
"small_salary %>%
group_by(Degree, Work) %>%
filter(Degree == 'Masters')" %>%
datamation_sanddance()
for this example, I will need to figure out what is going wrong. i will dig into that one a bit more tomorrow.
@sharlagelfand hackFacet
was completely ignoring transform
fields. I just included and it works now:
great, thanks!
The behaviour of vega lite when a facet has no values (filtered out via filter.transform
) is to remove the facet all together - just want to confirm that this is behaviour we like the look of (otherwise I can investigate if there is a way to explicitly set the domain of a facet so the facets are retained):
"small_salary %>%
group_by(Degree, Work) %>%
filter(Degree == 'Masters')" %>%
datamation_sanddance()
Sounds like we're not sure what to do here. On one hand if there are two groups and you filter on one, seems okay to leave the other facet there but empty. On the other hand, if you had many groups and limited to just one, you'd probably want to drop the others.
Seems related to the idea of whether you keep around unused levels of a factor when plotting. Perhaps we can take inspiration from ggplot2 defaults on this?
At the moment I'd be okay with the way things currently work, with the empty facet getting dropped.
@jhofman I think your point about looking to the actual data state (in the context of count
, but applies here) might be a good way to direct us - e.g. filtering after the summarise results in a single row df
small_salary %>%
group_by(Degree) %>%
summarise(median = median(Salary)) %>%
filter(median > 90)
# # A tibble: 1 × 2
# Degree median
# <chr> <dbl>
# 1 Masters 91.1
so maybe the empty x-axis value should get dropped, too, e.g. leaning more into the "empty value is dropped" that's present with the faceting.
agreed, we'll drop facets that get filtered out and we'll mirror this for x axis values that get dropped as well.
see #119 for an enhancement some day where this behavior could be overridden with "visual options".
@giorgi-ghviniashvili I think there needs to be some additions on the JS side to properly support this "drop the x-axis / facet value" feature in filtering in info grids, since I am not controlling the values.
If the first x-axis value is filtered out, the spec seems pretty mis-aligned:
"small_salary %>%
group_by(Degree, Work) %>%
filter(Work == 'Academia')" %>%
datamation_sanddance()
Raw spec:
If the second x-axis value is filtered out, the colour disappears in the last frame?
"small_salary %>%
group_by(Degree, Work) %>%
filter(Work == 'Industry')" %>%
datamation_sanddance()
Raw spec:
If both x-axis values are filtered out, can you update encoding.x.axis.values = []
?
"small_salary %>%
group_by(Work) %>%
filter(Work == 'Bachelors')" %>%
datamation_sanddance()
Raw specs:
The empty values will make it look like this:
instead of like this:
(This is what I am doing in the summarize step if both are dropped, so to be consistent!)
There is also an issue when all the facets are filtered out:
"small_salary %>%
group_by(Degree, Work) %>%
filter(Degree == 'Bachelors')" %>%
datamation_sanddance()
Raw specs:
This produces an error in the console:
TypeError: column_header is undefined
Thanks!
@giorgi-ghviniashvili will take a look at the above for next time
Hey @sharlagelfand and @jhofman
seems like the misalignment is fixed in the new PR:
yes, after filtering colors disappears, to keep them, we need to explicitly set scale.domain
:
"color": {
"field": "Work",
"type": "nominal",
"legend": {
"values": ["Academia", "Industry"]
},
"scale": {
"domain": ["Academia", "Industry"],
"range": ["rgba(76, 120, 168, 0.7)", "rgba(245, 133, 24, 0.7)"]
}
}
when there is empty filter, everything was messed up, was having an error in the console and nothing was drawn.
"transform": [
{
"filter": {
"field": "gemini_id",
"oneOf": []
}
}
]
To solve this, I generate empty spec, and avoid any processing:
Which produces this:
Note, that I used splitField for x axis title, if you want a different thing in a different occasions, we might need to put a title in meta for this?
Seems like only scale.domain
works, without color range.
"scale": {
"domain": ["Academia", "Industry"]
}
@giorgi-ghviniashvili: related to #106, can you prototype what filtering out points would look like?
We know we want them to "disappear" or fade out, but it'll be interesting to explore how we annotate which points disappear and why.
for instance, on the salary data, imagine two different filtering operations:
small_salary %>% filter(Work == "Academia")
small_salary %>% filter(Salary >= 85, Salary <= 90)
One simple way to visualize things that could handle both of these would be to just have a grid for all of the points and a legend that has the filtering condition and either uses color or open/closed circles to indicate True/False on the condition, and then have the false ones fade away.
This seems like it would generalize pretty well, but maybe it leaves a bit to be desired as well. For instance w/ the filtering on salary itself, maybe you'd want to see a salary amounts visualized on the y axis and then the filtering applied? This is more intuitive, but is harder to generalize.
And then there are cases that combine filtering on different variables, like steps 1 and 2 above at the same time. That starts to get tricky unless you just do the true / false legend version, right?