richfitz / storr

:package: Object cacher for R
http://richfitz.github.io/storr
Other
116 stars 10 forks source link

Strange edge case that consumes gratuitous memory #76

Closed wlandau closed 6 years ago

wlandau commented 6 years ago

In https://github.com/ropensci/drake/issues/383, @bart1 has a workflow that uses R6 classes within other R6 classes. Sometimes, memory blows up when reading these objects from the cache, and it seems to depend on the environment in which the storr object is created.

library(pryr)
library(R6)
library(storr)

Call <- R6Class("CallItem", public = list(rstring = c(NA), id = "", initialize = function() {
  self$rstring <- sample(letters)
  self$id <- paste(collapse = "", self$rstring)
}))
CallCollections <- R6Class("CallCollections", public = list(callList = list(), 
  initialize = function(n = 1000) {
    self$callList <- replicate(Call$new(), n = n)
  }))

cache <- storr_rds("cache")

f <- function(key, cache) {
  cache$get(key)
}

g <- function(key) {
  cache <- storr_rds("cache")
  cache$get(key)
}

x <- CallCollections$new()
cache$set("x", x)
object_size(f("x", cache))
#> 1.19 MB
object_size(g("x"))
#> 48 MB
wlandau commented 6 years ago

After more attempts to diagnose, I think problem may have nothing to do with R6 (ref: https://github.com/ropensci/drake/issues/383#issuecomment-388596141, https://github.com/ropensci/drake/issues/383#issuecomment-388654425).

richfitz commented 6 years ago

Hi Will - is this something that looks like a storr problem in the end?

wlandau commented 6 years ago

That is my suspicion, but I am not entirely sure. At the very least, it would be helpful to know for sure if it is not storr.

richfitz commented 6 years ago

Here's a version without storr:

library(pryr)
library(R6)

Call <- R6Class("CallItem", public = list(rstring = c(NA), id = "", initialize = function() {
  self$rstring <- sample(letters)
  self$id <- paste(collapse = "", self$rstring)
}))
CallCollections <- R6Class("CallCollections", public = list(callList = list(), 
  initialize = function(n = 1000) {
    self$callList <- replicate(Call$new(), n = n)
  }))

x <- CallCollections$new()
y <- unserialize(serialize(x, NULL))
object_size(y)

This could be to do with the way that environments are serialized/deserialized

richfitz commented 6 years ago

(same with saveRDS/readRDS but this is the lowest level I can think of)

wlandau commented 6 years ago

Thanks!

bart1 commented 6 years ago

Thanks both for your efforts! it seems to be more of an issue with R6 in combination with serialization. I will investigate further and might bring it up with @wch. A list of R6 objects seems to have the same effects.

> require(pryr)
> require(R6)
> as<-R6Class('as')
> a<-list(as$new(),as$new())
> b<-unserialize(serialize(a,NULL))
> object_size(a)
49.2 kB
> object_size(b)
92.7 kB
bart1 commented 6 years ago

I guess this behaviour is actually pretty similar to the base R behaviour where objects are referencing the same object until changed and therefor save memory;

> require(pryr)
> x<-3:500
> xx<-list(x,x)
> object_size(x)
2.03 kB
> object_size(xx)
2.09 kB
> xb<-unserialize(serialize(xx,NULL))
> object_size(xb)
4.12 kB
> inspect(xx)
<VECSXP 0x280652e8>
  <INTSXP 0x2980e5c0>
  [INTSXP 0x2980e5c0]
> inspect(xb)
<VECSXP 0x28065518>
  <INTSXP 0x2476ad20>
  <INTSXP 0x288d9940>