Open sebffischer opened 1 year ago
Note that it is possible to overwrite the --with-keep.source
with --without-keep.source
Am I correct in assuming that via https://github.com/rstudio/renv/blob/36e1b5192fbc9f380dd4fffc6ee837856d5d3e75/R/r.R#L206C38-L206C50 we can override this with the install.opts
option, e.g. options("install.opts" = "--without-keep.source")
?
(EDIT: Yes, apparently that works)
I have just downgraded to renv v0.17.0 for a project where this became an issue, and I'm trying to see if I can use a recent renv version with this workaround just in case the downgrade caused other issues.
In any case, this issue affected me by causing a serialization step (saveRDS
) of many objects containing R6 objects to become
a) unreasonably slow and
b) taking up over 200GB of real memory on our workstation after over 20 hours of running
🫠
b) taking up over 200GB of real memory on our workstation after over 20 hours of running
🤯
Do you know why this is happening? I wonder if there's a fix that needs to happen in R6, or one of the packages using R6...
I think @sebffischer might be able to elaborate on that a bit more - I just got the 'srcrefs full of stuff' and 'R6 is somehow not great in that regard' bits of that discussion 🙃
The problem is essentially the one I have already described in the first post in this issue, i.e. the size of serialized functions changes considerably when they contain srcrefs. While this might not seem horrible in itself, it becomes a major problem in combination with the way the mlr3
team uses R6 classes and how enviornments are (un)-serializeed (see code snippet below).
We already partially mitigate this problem ourselves by taking care that the methods of our R6 objects do not contain srcrefs, essentially overwriting the behaviour of --with-keep.source
, but we cannot do this for functions from other packages.
# "Create R6 classes"
e1 = new.env()
e2 = new.env()
# User sets fields of the "R6 classes"
e1$fn = ggplot2::facet_grid
e2$fn = ggplot2::facet_grid
# the srcref attribute is represented only once in memory
identical(attr(e1$fn, "srcref"), attr(e2$fn, "srcref"))
#> [1] TRUE
pryr::object_size(list(e1$fn, e2$fn))
#> 1.76 MB
# User saves the "R6 classes"
p1 = tempfile()
p2 = tempfile()
saveRDS(e1, p1)
saveRDS(e2, p2)
# User reloads the "R6 classes"
l1 = readRDS(p1)
l2 = readRDS(p2)
# reference identities are lost
identical(attr(l1$fn, "srcref"), attr(l2$fn, "srcref"))
#> [1] FALSE
# RAM usage blows up
pryr::object_size(list(l1$fn, l2$fn))
#> 4.17 MB
Created on 2023-10-17 with reprex v2.0.2
May I ask why the decision was made to add the --with-keep.source
option?
Recently,
renv
changed the way packages are installed by adding with--with-keep.source
installation option. I have created a related (solved) issue here: https://github.com/rstudio/renv/issues/1688.While the problems that arose in this issue are fixed from our side, we are now running into related problems that also arise due to the installation option
--with-keep.source
. A side effect of keeping all the sources is that the size of serialized functions becomes much larger.If I e.g. save the
ggplot2::ggplot
function when installing ggplot2 without source, the function has 1.76 KB, wheres when I install the package with the sources, the serializedggplot2::ggplot
function has 1.65 MB, a factor of around 1000.I believe that the culprit here is
attributes(attributes(ggplot2::ggplot)$srcref)$srcfile$original$lines
, which is a large character vector containing the sources.It would be great if one could disable the
--with-keep.source
flag. In light of the above issue, might it also make sense to not set this option by default?