tidyverse / dplyr

dplyr: A grammar of data manipulation
https://dplyr.tidyverse.org/
Other
4.76k stars 2.12k forks source link

Undocumented change: preservation of class attribute #4051

Closed heavywatal closed 4 years ago

heavywatal commented 5 years ago

The class attribute is lost in the released 0.7.8, and preserved in the dev version:

x = as_tibble(iris)
class(x) = c("myclass", class(x))
x %>% dplyr::filter(Sepal.Length < 5) %>% class()

# 0.7.8
# [1] "tbl_df"     "tbl"        "data.frame"

# 0.7.99.9000
# [1] "myclass"    "tbl_df"     "tbl"        "data.frame"

It can be a side effect of the changes in grouped_df (or slice()), and happens with the stable tibble 1.4.2 and other verbs like mutate(). Anyway it is favorable to me and seems to be consistent with the upcoming tibble with subclass support. I just hope it is documented more explicitly so that I can give a clear direction to issues like GuangchuangYu/tidytree#7.

elinw commented 5 years ago

This is a breaking change for skimr because it means that the print function no longer is print.data.frame() but rather print.myclass() so the output needs to be piped to as.data.frame() in order to avoid an error. Thanks @romainfrancois for alerting us to the breakage. I feel this is an extremely significant change that, if kept, should be documented in the list of major breaking changes. I'm actually not sure what to do that would work with both versions except for overloading the impacted dplyr verbs.

elinw commented 5 years ago

Based on https://github.com/tidyverse/dplyr/issues/4051 I guess we should do mutate also; it's just not in any of our examples or vignettes. Are you aware of this being an issue for other verbs?

yutannihilation commented 5 years ago

Is the preservation of class really an intended change? If so, I'll file the below as a new issue; for regrouping purpose, group_by(..., add = TRUE) might be better replaced because group_by() doesn't respect the original classes.

library(dplyr, warn.conflicts = FALSE)

x <- as_tibble(iris)
class(x) <- c("myclass", class(x))
x %>%
  add_count(Species) %>%
  class()
#> [1] "tbl_df"     "tbl"        "data.frame"

Created on 2019-01-07 by the reprex package (v0.2.1)

hadley commented 5 years ago

Yes, it's definitely intended although it won't be until dplyr 0.9.0 (where we have full vctrs integration) that we expect every function to do the right thing. The behaviour of add_count() etc is going to need quite a lot of thought, so an issue would be helpful.

yutannihilation commented 5 years ago

Thanks, it's really a good news to me then :) I'll file a new issue.