Closed leungi closed 5 years ago
Please try the use_alt_rep=F
parameter in qread
. That should reduce memory usage. However, object.size
should not show any difference, regardless of whether using qs
or not.
R also has a garbage collection setup, which may not happen right away. You can call it with gc()
(although that's an R issue).
Memory usage may look higher than reality if R hasn't run gc in the background in a while.
@traversc: thanks for prompt reply.
object.size()
did return equal size; must've been lingering variables in environment.
However, Windows memory usage bloat persist.
> Sys.getpid()
[1] 29440
>
> data_FALSE <- qs::qread('test.q', use_alt_rep = FALSE)
>
> data_TRUE <- qs::qread('test.q', use_alt_rep = TRUE)
>
> identical(data_FALSE, data_TRUE)
[1] TRUE
>
> object.size(data_FALSE)
2770498608 bytes
> object.size(data_TRUE)
2770498608 bytes
Running gc()
has no effect.
Does the memory issue persist if you start a new session and only use use_alt_rep=F
?
Same issue (screenshot attached).
> Sys.getpid()
[1] 29500
> data_FALSE <- qs::qread('test.q', use_alt_rep = FALSE)
> object.size(data_FALSE)
2770498608 bytes
> gc()
used (Mb) gc trigger (Mb) max used (Mb)
Ncells 1424532 76.1 2213915 118.3 2213915 118.3
Vcells 340659266 2599.1 582678634 4445.5 447116858 3411.3
I believe that to be in line with expectation.
I ran the following test (on Mac right now, but will try it out on windows later)
Generate data -- about 2.1 GB data frame according to object.size
library(dplyr)
library(qs)
z <- starnames %>% dplyr::sample_n(3e7, replace=T)
qsave(z, file="/tmp/test.z")
saveRDS(z, file="/tmp/test.rds")
New session:
library(qs)
z <- readRDS("/tmp/test.rds")
gc()
Memory usage according to ps
command: 2737700 kb (1024 bytes)
New session:
library(qs)
z <- qread("/tmp/test.z", use_alt_rep = F)
gc()
Memory usage according to ps
command: 2475848 kb (1024 bytes)
So there is no difference in memory usage (in fact using qs
is lower than readRDS
) but both are significantly higher than just by object.size
.
I suspect this is because of how R works. Speaking outside of my knowledge now, but I suspect R reserves more memory than it is currently using, so it can quickly provision memory for any new object.
Regardless, I believe it isn't a qs
issue. Let me know if you disagree or if there is anything else.
Using use_alt_rep = F
option does resolve this issue, as you suggested.
Closing this; appreciate your investigation!
Hi,
I was promoting your
qs
package to another user, when he realized that upon loading.q
files into R session, the object size may be bloated by 2-3x.I validated his findings with my own
.q
files.The file size shown in RStudio Environment pane may be misleading (reflecting similar size as saved object), but inspecting object size via
object.size()
and checking memory use via Windows Resource Monitor confirmed this bloating issue.