rstudio / renv

renv: Project environments for R.
https://rstudio.github.io/renv/
MIT License
1.02k stars 155 forks source link

Option to disable installing packages with `--with-keep.source` #1713

Open sebffischer opened 1 year ago

sebffischer commented 1 year ago

Recently, renv changed the way packages are installed by adding with --with-keep.source installation option. I have created a related (solved) issue here: https://github.com/rstudio/renv/issues/1688.

While the problems that arose in this issue are fixed from our side, we are now running into related problems that also arise due to the installation option --with-keep.source. A side effect of keeping all the sources is that the size of serialized functions becomes much larger.

If I e.g. save the ggplot2::ggplot function when installing ggplot2 without source, the function has 1.76 KB, wheres when I install the package with the sources, the serialized ggplot2::ggplot function has 1.65 MB, a factor of around 1000.

I believe that the culprit here is attributes(attributes(ggplot2::ggplot)$srcref)$srcfile$original$lines, which is a large character vector containing the sources.

It would be great if one could disable the --with-keep.source flag. In light of the above issue, might it also make sense to not set this option by default?

sebffischer commented 1 year ago

Note that it is possible to overwrite the --with-keep.source with --without-keep.source

jemus42 commented 1 year ago

Am I correct in assuming that via https://github.com/rstudio/renv/blob/36e1b5192fbc9f380dd4fffc6ee837856d5d3e75/R/r.R#L206C38-L206C50 we can override this with the install.opts option, e.g. options("install.opts" = "--without-keep.source")?

(EDIT: Yes, apparently that works)

I have just downgraded to renv v0.17.0 for a project where this became an issue, and I'm trying to see if I can use a recent renv version with this workaround just in case the downgrade caused other issues.

In any case, this issue affected me by causing a serialization step (saveRDS) of many objects containing R6 objects to become a) unreasonably slow and b) taking up over 200GB of real memory on our workstation after over 20 hours of running

🫠

kevinushey commented 1 year ago

b) taking up over 200GB of real memory on our workstation after over 20 hours of running

🤯

Do you know why this is happening? I wonder if there's a fix that needs to happen in R6, or one of the packages using R6...

jemus42 commented 1 year ago

I think @sebffischer might be able to elaborate on that a bit more - I just got the 'srcrefs full of stuff' and 'R6 is somehow not great in that regard' bits of that discussion 🙃

sebffischer commented 1 year ago

The problem is essentially the one I have already described in the first post in this issue, i.e. the size of serialized functions changes considerably when they contain srcrefs. While this might not seem horrible in itself, it becomes a major problem in combination with the way the mlr3 team uses R6 classes and how enviornments are (un)-serializeed (see code snippet below).

We already partially mitigate this problem ourselves by taking care that the methods of our R6 objects do not contain srcrefs, essentially overwriting the behaviour of --with-keep.source, but we cannot do this for functions from other packages.

# "Create R6 classes"
e1 = new.env()
e2 = new.env()
# User sets fields of the "R6 classes"
e1$fn = ggplot2::facet_grid
e2$fn = ggplot2::facet_grid

# the srcref attribute is represented only once in memory
identical(attr(e1$fn, "srcref"), attr(e2$fn, "srcref"))
#> [1] TRUE
pryr::object_size(list(e1$fn, e2$fn))
#> 1.76 MB

# User saves the "R6 classes"
p1 = tempfile()
p2 = tempfile()
saveRDS(e1, p1)
saveRDS(e2, p2)

# User reloads the "R6 classes"
l1 = readRDS(p1)
l2 = readRDS(p2)

# reference identities are lost
identical(attr(l1$fn, "srcref"), attr(l2$fn, "srcref"))
#> [1] FALSE
# RAM usage blows up
pryr::object_size(list(l1$fn, l2$fn))
#> 4.17 MB

Created on 2023-10-17 with reprex v2.0.2

May I ask why the decision was made to add the --with-keep.source option?