tidyverse / dplyr

dplyr: A grammar of data manipulation
https://dplyr.tidyverse.org/
Other
4.79k stars 2.12k forks source link

used `by =` instead of ` .by=` in mutate, but no error or warning returned #7096

Open ywhcuhk opened 1 month ago

ywhcuhk commented 1 month ago

I mistakenly used by inside mutate instead of .by. But no error or warning were raised. I only found this due to a coincidence when I happen to have two pieces of code (one using by, the other using .by) producing different results.

I understand that when by= is used inside mutate, dplyr would think I am creating a new column called by with values the same as the group variable. There is nothing with the logic. I just thought an warning or something should be raised because it's such an easy mistake to make.


Below I include a quick example

d1 = tibble(x=c(1:6), y=c(rep(1,3), rep(2,3)))

d1 |> mutate(lag_x = lag(x), by=y)

# A tibble: 6 × 4
#       x     y lag_x    by
# 1     1     1    NA     1
# 2     2     1     1     1
# 3     3     1     2     1
# 4     4     2     3     2
# 5     5     2     4     2
# 6     6     2     5     2

d1 |> mutate(lag_x = lag(x), .by=y)

# A tibble: 6 × 3
#       x     y lag_x
# 1     1     1    NA
# 2     2     1     1
# 3     3     1     2
# 4     4     2    NA
# 5     5     2     4
# 6     6     2     5
olivroy commented 1 month ago

by is a valid column name. This is the reason why mutate() uses .by.

ywhcuhk commented 1 month ago

Yes, I understand that. I am just thinking a typo of by= is so easy. Maybe it's just me...