tidyverse / dplyr

dplyr: A grammar of data manipulation
https://dplyr.tidyverse.org/
Other
4.78k stars 2.12k forks source link

group_trim() #4010

Closed romainfrancois closed 5 years ago

romainfrancois commented 5 years ago

When we do this:

g <- iris %>% 
  filter(Species == "setosa") %>%
  group_by(Species)

we still end up with 3 groups because we need to respect the levels. Previous versions would only give one group.

group_trim() would be about giving the ability to re-model the factors.

g %>%
  group_trim(Species)

perhaps we'd need variants with _all and _at (_if does not make much sense)

g %>%
  group_trim_at("Species")

g %>% 
  group_trim_all()

This is essentially a 4 steps

Is dropping the only thing we want or would it make sense to have something to lump too ?, or should it be more of a mutate type interface, e.g.

%>% 
  group_trim(f1 = fct_drop(f1), f2 = fct_lump(f2)) 
romainfrancois commented 5 years ago

i.e. a mutate that is only allowed to overwrite factors.

yutannihilation commented 5 years ago

Is it possible to leave the original levels as is and trim only the grouping structure on calculation? IMHO, what the users want is probably not that fine-grained control over factors by an extra function, but something that can be done within group_by() like this (and TRUE by default for backward-compatibility):

g <- iris %>% 
  filter(Species == "setosa") %>%
  group_by(Species, .trim = TRUE)

In this case, grouped_df should remember the decision whether to drop empty groups or not so that it can decide automatically when it gets regrouped later (I expect regrouping will frequently happen if the default of .preserve is FALSE).

romainfrancois commented 5 years ago

Is it possible to leave the original levels as is and trim only the grouping structure on calculation?

This is essentially the previous behaviour. Unless I don't understand what you mean. But the contract with the new dplyr is that if there is a level, it is represented. That's something we feel strongly about.

I don't think an argument to group_by() is visually strong enough.

yutannihilation commented 5 years ago

Ah, oh... Sorry, I terribly misunderstood how the new dplyr works. I thought the groups attribute is the master table of the groups and all verbs would work based on that information. Really sorry for the noise.... (edit: I'm confused, sorry...)

romainfrancois commented 5 years ago

This will be much simpler than ⬆️ and more 💪 will be added to group_by() in #4029

lock[bot] commented 5 years ago

This old issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with reprex) and link to this issue. https://reprex.tidyverse.org/