Closed emilBeBri closed 5 years ago
Hi @emilBeBri , this is an issue with data.table
and saving to disk, not qs
. You can see that even with other serialization methods, you get the same issue:
library(data.table)
dat1 <- data.table(id=c(1,1,1,2,2,2), runif(6,0,5), z = letters[1:6])
dat2 <- data.table(id=c(3,3,3,4,4,4), runif(6,0,5), z = letters[1:6])
saveRDS(dat1, file = "/tmp/dat1.rds")
dat1 <- readRDS("/tmp/dat1.rds")
saveRDS(dat2, file = "/tmp/dat2.rds")
dat2 <- readRDS("/tmp/dat2.rds")
for (DT in c('dat1', 'dat2')) {
get(DT)[, newcol := 1]
}
dat1
id V2 z
1: 1 2.2551426 a
2: 1 0.3937463 b
3: 1 1.4704248 c
4: 2 4.7696833 d
5: 2 3.8110676 e
6: 2 4.4503739 f
The reason is data.table
uses C reference pointers to places in memory:
> attributes(dat1)
...
$.internal.selfref
<pointer: 0x0>
A quick fix would be to re-wrap the data.table:
library(data.table)
library(qs)
dat1 <- data.table(id=c(1,1,1,2,2,2), runif(6,0,5))
dat2 <- data.table(id=c(3,3,3,4,4,4), runif(6,0,5))
qsave(dat1, './dat1.qs', preset='high')
dat1 <- data.table(qread('./dat1.qs'))
qsave(dat2, './dat2.qs', preset='high')
dat2 <- data.table(qread('./dat2.qs'))
for (DT in c('dat1', 'dat2')) {
get(DT)[, newcol := 1]
}
> dat1
id V2 newcol
1: 1 0.08454065 1
2: 1 4.36837604 1
3: 1 3.49920527 1
4: 2 0.02507492 1
5: 2 1.65069644 1
6: 2 1.29881559 1
Hopefully that answers your question. If it does, please feel free to close :)
nice! Wrapping it in data.table is a neat trick to circumvent this. Just checked on the example data: You can also do it with setDT() and thereby not making any copiyng at all , so this is probably the most efficient solution (although perhaps more error-prone?)
Hi, so, when loading a data.table saved as an qs-file, and using a for-loop with by-reference created variables, the package fails, like so:
if, however, one does not save it as a qs-object, or, if not doing the variable creation in a loop, but for each indiviual data.table, everything is fine. I tried using the argument use_alt_rep=F as well, but that does not help. if one uses the copy() function just after loading the qs-objects, it also works, but that seems very inefficient on big data.tables.
greetings, Emil