hope-data-science / tidyft

Tidy Verbs for Fast Data Operations by Reference
https://hope-data-science.github.io/tidyft/
Other
34 stars 3 forks source link

Feature request: combining different summarise functions #3

Open camnesia opened 1 year ago

camnesia commented 1 year ago

I went through the documentation and was unable to find a way to combine different summarise functions. Summarise_vars would also need a .names ability to work in such a manner.

iris %>%
  as.data.table() %>%
  tidyft::arrange(Sepal.Length) %>%
  tidyft::summarise(Sepal_legth_Max = max(Sepal.Length), by = Species) %>%
  summarise_when(Petal.Length > 2, Petal_Length_avg = mean(Petal.Length), by = Species) %>%
  summarise_vars('Width', .func = function(x) na.omit(first(x)), by = Species) %>%
  summarise_vars('Width', .func = function(x) na.omit(last(x)), by = Species)
hope-data-science commented 1 year ago

Thank you for noting. summarise_vars is trying to update values in its original name, therefore should not be used in this way. I suggest using them separately and combine in the end. Using a function to get them all is impossible even in other packages. If you have a success case elsewhere, let me know and I'll figure out if I can add it in tidyft.

Thanks.

camnesia commented 1 year ago

The across() function in poorman package, that is used within dplyr, can be used several times within the summarise function and works similarly with tidyft::summarise_vars()

iris %>%
  dplyr::mutate(Petal_Length_avg = case_when(Petal.Length > 2 ~ Petal.Length,
                                             TRUE ~ NA_real_)) %>%
  dplyr::group_by(Species) %>%
  dplyr::summarise(Sepal_legth_Max = max(Sepal.Length),
                   Petal_Length_avg = mean(Petal_Length_avg),
                   across(ends_with('Width'), ~na.omit(first(.)), .names = '{.col}_first'),
                   across(ends_with('Width'), ~na.omit(last(.)), .names = '{.col}_last'))
hope-data-science commented 1 year ago

Thank you for noting, in such case I have seen no data.table equivalent to use similar across, so as to make new columns. If you find any way to address it, let me know.

Thanks.

camnesia commented 1 year ago

This seems to be similar enough to across and works with data.table but I'm not sure its the most elegant solution.

starts_with_ft <- function(data, string){
  grep(paste0('^', string), names(data), ignore.case=TRUE, value = TRUE)}
ends_with_ft <- function(data, string){
  grep(paste0(string, '$'), names(data), ignore.case=TRUE, value = TRUE)}
contains_ft <- function(data, string){
  grep(string, names(data), ignore.case=TRUE, value = TRUE)}

iris_dt <- iris %>%
  as.data.table()

cols <- ends_with_ft(iris_dt, "width")

iris_dt[, setNames(first(na.omit( .SD[, ..cols])), glue('{cols}_first')), by = Species]