tidyverse / dplyr

dplyr: A grammar of data manipulation
https://dplyr.tidyverse.org/
Other
4.79k stars 2.12k forks source link

Allow .by=row_number() in mutate statements #7009

Closed torfason closed 6 months ago

torfason commented 8 months ago

I want to calculate a scale and add it to my data frame. Here are three options that do work:

d |>
  rowwise() |>
  mutate(my_scale_a = mean(c_across(starts_with("M:")), na.rm = TRUE)) |>
  ungroup()

Works, and used to be recommended, but now that group_by() is getting slowly superseded by the .by keyword, it seems we need a replacement for rowwise() as well.

My data happens to have an ID column, so I can get the same result (I checked) with:

d |>
  mutate(my_scale_b = mean(c_across(starts_with("M:")), na.rm = TRUE), .by=`Response ID`)

If I did not have an ID column, I could have created one with:

d |>
  mutate(rownum = row_number()) |>
  mutate(my_scale_c = mean(c_across(starts_with("M:")), na.rm = TRUE), .by=rownum)

But of course, I then want a fourth way to do this, which does not work:

> d.org |>
+   mutate(my_scale_d = mean(c_across(starts_with("M:")), na.rm = TRUE), .by=row_number())
Error in `mutate()`:
! Problem while evaluating `row_number()`.
Caused by error in `n()`:
! Must only be used inside data-masking verbs like `mutate()`, `filter()`, and `group_by()`.
Run `rlang::last_trace()` to see where the error occurred.

All of this is a long-winded way of saying that row_number() seems to have a very useful interpretation as an argument to the .by parameter within a mutate() function.

DavisVaughan commented 6 months ago

Duplicate of https://github.com/tidyverse/dplyr/issues/6660