Open danwwilson opened 1 month ago
@danwwilson - this is not to do with tibble but with the fact that you should avoid using setDT
programmatically (i.e use as.data.table()
instead) or exercise great care in non-interactive settings. To illustrate:
as_bob <- function(x) {
x <- as.data.frame(x)
class(x) <- c("bob", class(x))
x
}
df <- data.frame(
a = 1:5,
b = LETTERS[1:5],
c = 5:1,
d = c("a", "a", "b", "b", "b")
)
(bob <- as_bob(df))
#> a b c d
#> 1 1 A 5 a
#> 2 2 B 4 a
#> 3 3 C 3 b
#> 4 4 D 2 b
#> 5 5 E 1 b
select_first(df, keyby = "c")
#> Key: <c>
#> a b c d
#> <int> <char> <int> <char>
#> 1: 5 E 1 b
bob
#> a b c d
#> 1 1 E 5 b
#> 2 2 D 4 b
#> 3 3 C 3 b
#> 4 4 B 2 a
#> 5 5 A 1 a
The onus would be on data.table to change the behaviour of setDT()
(should they wish to) but that would likely remove it's efficient nature. You could make your function force a copy()
but in that case you might as well call as.data.table()
(at this point the only real difference in overhead would be the method dispatch).
Hope that makes sense.
I am writing a simple function to return the first row of a dataset using data.table. I don't want to assume all inputs will be a data.table so was conducting some testing with tibbles and came across the following bug.
When using
as_tibble()
(from ‘3.2.1’) to define a tibble I assume it must keep some reference to the original data.frame. The reprex below shows some strange reordering that happens when the original data.frame is modified and then a function is called on the tibble created fromas_tibble()
.I am trying to understand why this might occur. I would have thought that
tbl <- as_tibble(df)
would have assigned an item to a new memory location and no longer referenceddf
. If we change the order of execution to run `select_first(tbl, keyby = "c") before any others it returns the desired output.