tidyverse / dplyr

dplyr: A grammar of data manipulation
https://dplyr.tidyverse.org/
Other
4.78k stars 2.12k forks source link

Mutate: ".by" returns attributes in a different order to the "group, mutate, ungroup" pattern #7081

Closed AlexBainton closed 2 months ago

AlexBainton commented 2 months ago

Mutating grouped variables using the .by argument returns a dataframe that is not exactly equal to the group() |> mutate() |> ungroup() method. This is consequential when using tests such as testthat::expect_known_hash() which returns a different value for the two patterns.

Specifically, using .by seems to return a slightly malformed dataframe (attributes are in an arbitrary wrong order), which is corrected when run through tibble(). $class seems to be either second or third when it should be the first attribute.

library(tidyverse)
# Dplyr 1.1.4

iris_mutated_with_by <- iris |>
  mutate(.by = Species)

iris_mutated <- iris |>
  group_by(Species) |>
  mutate() |>
  ungroup()

# FALSE, but should be true.
identical(iris_mutated, iris_mutated_with_by, attrib.as.set = FALSE)

# These return different hashes as a result.
iris_mutated |> testthat::expect_known_hash("679a41dc2c")
iris_mutated_with_by |> testthat::expect_known_hash("d3c5d07100")

# Running the .by mutated dataframe through tibble() cleans things up:
iris_mutated_with_by |> tibble() |> testthat::expect_known_hash("679a41dc2c")
DavisVaughan commented 2 months ago

I would argue that expect_known_hash() is the wrong thing to use here. There is no guarantee on attribute ordering, only that they exist, and I don't think you should rely on them being in a specific order