Closed ggrothendieck closed 4 years ago
🤔 ungroup
does have an ...
it does not use:
> dplyr:::ungroup.grouped_df
function(x, ...) {
ungroup_grouped_df(x)
}
<bytecode: 0x1026547e8>
<environment: namespace:dplyr>
but I'm not sure about having ungroup
also perform selection
Seems to me that incorporating this kind of logic into https://github.com/tidyverse/dplyr/issues/3721 would be the better solution for this use case.
I do think it would be neat if ungroup
could selectively remove some groupings but not others, e.g.
mtcars %>% group_by(gear, carb, cyl) %>% ungroup(cyl)
would be equivalent to
mtcars %>% group_by(gear, carb, cyl) %>% group_by(gear, carb)
which is how I first interpreted the title of this issue.
Here is another example taken from https://stackoverflow.com/questions/52906985/merging-of-duplicate-rows-that-have-misspelled-variables/52907932#52907932
library(phonics)
library(dplyr)
# create test data
Lines <- "CAR MPG
Mazda 5
Mazzda 2
Mzda 1"
DF <- read.table(text = Lines, header = TRUE, as.is = TRUE, strip.white = TRUE)
# process
DF %>%
group_by(key = soundex(CAR)) %>%
summarize(CAR = toString(CAR), MPG = sum(MPG)) %>%
ungroup %>%
select(-key)
With the feature under discussion this would simplify to the shorter and more symmetric:
DF %>%
group_by(key = soundex(CAR)) %>%
summarize(CAR = toString(CAR), MPG = sum(MPG)) %>%
ungroup(-key)
@mkoohafkan, The way group_by
currently works is that if you want to incrementally add a variable specify group_by(new_var, add = TRUE)
.
I suppose there is the question of whether add=TRUE
means add the variable to the group_by
or really means modify the group_by
and replace it with a new group_by
. In this latter case it would make sense to write group_by(-cyl, add = TRUE)
to remove cyl
from the group_by
while leaving the other group_by
variables in effect rather than using ungroup
for that.
Another possibility is to use ungroup(cyl, subtract = TRUE)
for that analogously to group_by(new_var, add = TRUE)
.
One other point is that I don't think incrementally adding and removing parts of a group_by
is that frequently encountered whereas I have repeated encountered the ungroup %>% select(-var)
sequence.
@ggrothendieck thought about this more and I agree with your statements that
ungroup(cyl)
to drop the column cyl
is symmetric and group_by(-cyl)
to remove a column from an existing grouping would be a bit confusing with the existing add
argument. If the add
argument to group_by
had originally been named update
this would be syntactically cleaner, e.g. group_by(cyl, update = TRUE)
and group_by(-cyl, update = TRUE)
.ungroup(..., subtract = TRUE)
looks like a good idea at first but... what would ungroup(cyl, subtract = FALSE)
mean?
group_by()
has mutate semantics, not select semantics (c.f. https://dplyr.tidyverse.org/articles/dplyr.html#selecting-operations). I guess you already noticed this when you tried group_by(-cyl, add = TRUE)
and saw -cyl
became the grouping variable.
dplyr::group_by(mtcars, -cyl)
#> # A tibble: 32 x 12
#> # Groups: -cyl [3]
#> mpg cyl disp hp drat wt qsec vs am gear carb `-cyl`
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 21 6 160 110 3.9 2.62 16.5 0 1 4 4 -6
#> 2 21 6 160 110 3.9 2.88 17.0 0 1 4 4 -6
#> 3 22.8 4 108 93 3.85 2.32 18.6 1 1 4 1 -4
#> 4 21.4 6 258 110 3.08 3.22 19.4 1 0 3 1 -6
#> 5 18.7 8 360 175 3.15 3.44 17.0 0 0 3 2 -8
#> 6 18.1 6 225 105 2.76 3.46 20.2 1 0 3 1 -6
#> 7 14.3 8 360 245 3.21 3.57 15.8 0 0 3 4 -8
#> 8 24.4 4 147. 62 3.69 3.19 20 1 0 4 2 -4
#> 9 22.8 4 141. 95 3.92 3.15 22.9 1 0 4 2 -4
#> 10 19.2 6 168. 123 3.92 3.44 18.3 1 0 4 4 -6
#> # ... with 22 more rows
Created on 2018-10-31 by the reprex package (v0.2.1)
So, to me, ungroup()
should have mutate semantics as well for consistency (though I don't know what it means to mutate when ungrouping...). A possible solution is to implement scoped variants for ungroup()
? (e.g. ungroup_at()
)?
Here is another case where this feature could be used taken from https://stackoverflow.com/questions/53240324/dplyr-collapse-tail-rows-into-larger-groups/53240699#53240699 In this case we are manufacturing a sort key in order to keep the table in its original sorted order. With the feature underdiscussion the select at the end of the code could be combined into the ungroup and so omitted.
Note how this keeps coming up again and again.
df <- tibble(a = as.factor(1:20), b = c(50, 20, 13, rep(2, 10), rep(1, 7)))
df %>%
group_by(sortkey = -b, a = paste0(if_else(b %in% 1:2, "grp", ""), b)) %>%
summarize(b = sum(b)) %>%
ungroup %>%
select(-sortkey)
Having a selective ungroup is also very import when calculating percentages of subgroups.
mtcars %>%
group_by(gear,carb,vs) %>%
summarise(count=n()) %>%
group_by(gear,carb) %>% #<< would be better to do ungroup(cyl)
mutate(perc=count/sum(count)) %>%
ungroup() %>%
spread(vs,perc,sep='=')
gear carb count `vs=0` `vs=1`
<dbl> <dbl> <int> <dbl> <dbl>
1 3 1 3 NA 1
2 3 2 4 1 NA
3 3 3 3 1 NA
4 3 4 5 1 NA
5 4 1 4 NA 1
6 4 2 4 NA 1
7 4 4 2 0.5 0.5
8 5 2 1 0.5 0.5
9 5 4 1 1 NA
I think it would be fine for ungroup()
to have select semantics even while group()
has action semantics. I'd suggest df %>% ungroup()
would continue to work as usual, and df %>% ungroup(x)
would remove x
from the grouping variables, throwing an error if not currently grouped by x
.
This old issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with reprex) and link to this issue. https://reprex.tidyverse.org/
A common case is that one constructs a grouping variable in
group_by
but only needs it for the duration of thegroup_by
so afterwards one must useselect
to get rid of it as in the example below. It would be pleasingly symmetric ifungroup
could remove the added column just asgroup_by
adds it sowould be the same as
Thus in this example taken from https://stackoverflow.com/questions/51939874/referencing-previous-column-value-as-column-is-created/51940343#51940343
we could write using one fewer statement, i.e. the last two lines of code above are combined into the last line below.
Note the reduced line count and improved symmetry.