Closed mariodejung closed 5 years ago
Hmm, that is strange. FFR, if you can run your example through reprex, it's helpful to see the input and the output right in the issue.
library(dplyr)
set.seed(1)
df <- data.frame(Intensity=rnorm(1000, 25, 3))
class(df)
#> [1] "data.frame"
df_backup <- df
class(df_backup)
#> [1] "data.frame"
df_test <- df %>%
dplyr::group_by_at(vars(matches('^species$'))) %>%
dplyr::summarise(`5%`=stats::quantile(log10(Intensity),.05),
`50%`=stats::quantile(log10(Intensity),.50),
`95%`=stats::quantile(log10(Intensity),.95))
class(df)
#> [1] "tbl_df" "tbl" "data.frame"
class(df_test)
#> [1] "tbl_df" "tbl" "data.frame"
class(df_backup)
#> [1] "tbl_df" "tbl" "data.frame"
library(lobstr)
obj_addr(df)
#> [1] "0x7f86a3fb95d8"
obj_addr(df_backup)
#> [1] "0x7f86a3fb95d8"
obj_addr(df_test)
#> [1] "0x7f86a48584a8"
Created on 2019-02-25 by the reprex package (v0.2.1)
I added the object address code from Binding basics in Advanced R. df
and df_backup
are just two names bound to the same value, but that is surprising that the copy on modify doesn't preserve df_backup
as it was….
I even use this call within a function and it changes the objects outside the function!
library(dplyr)
set.seed(1)
df<- data.frame(Intensity=rnorm(1000, 25, 3))
class(df)
#> [1] "data.frame"
df_backup <- df
class(df_backup)
#> [1] "data.frame"
my_plotAbundanceRank <- function(data_set) {
quantile_df <-
data_set %>%
dplyr::group_by_at(vars(matches('^species$'))) %>%
dplyr::summarise(`5%`=stats::quantile(log10(Intensity),.05),
`50%`=stats::quantile(log10(Intensity),.50),
`95%`=stats::quantile(log10(Intensity),.95))
}
print(my_plotAbundanceRank(df))
#> # A tibble: 1 x 3
#> `5%` `50%` `95%`
#> <dbl> <dbl> <dbl>
#> 1 1.30 1.40 1.48
class(df)
#> [1] "tbl_df" "tbl" "data.frame"
class(df_backup)
#> [1] "tbl_df" "tbl" "data.frame"
I think we just need a shallow copy in the case when no groups are specified. https://github.com/tidyverse/dplyr/blob/16125d12d809286ff2f18be8187b036e9ddbbc0e/src/group_indices.cpp#L645
Basically it should do what ungroup_grouped_df()
does:
https://github.com/tidyverse/dplyr/blob/16125d12d809286ff2f18be8187b036e9ddbbc0e/src/group_indices.cpp#L669
suppressPackageStartupMessages(library(dplyr))
#> Warning: package 'dplyr' was built under R version 3.5.2
x <- data.frame(y = 1)
x
#> y
#> 1 1
dplyr::group_by(x)
#> # A tibble: 1 x 1
#> y
#> <dbl>
#> 1 1
x
#> # A tibble: 1 x 1
#> y
#> <dbl>
#> 1 1
suppressPackageStartupMessages(library(dplyr))
#> Warning: package 'dplyr' was built under R version 3.5.2
x <- data.frame(y = 1)
x
#> y
#> 1 1
dplyr::group_by(x, y)
#> # A tibble: 1 x 1
#> # Groups: y [1]
#> y
#> <dbl>
#> 1 1
x
#> y
#> 1 1
This old issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with reprex) and link to this issue. https://reprex.tidyverse.org/
I couldn't thing about a better title, but I run into problems since the new dplyr release.
I narrowed it down to a single dplyr call, but it changes different variables in my script, even they are not "touched" by the dplyr call.
First of all, there is a
group_by_at
call, because 'if there is a column "species", I want to group by it'. If the column does not exist, I get a warning, which was fine for me, but I don't understand the class changes for the other variables. This bringt problems downstream in my script because older functions can't handle thetibble
yet.