tidyverse / magrittr

Improve the readability of R code with the pipe
https://magrittr.tidyverse.org
Other
961 stars 157 forks source link

Pipe messing with the environment #146

Open Enchufa2 opened 7 years ago

Enchufa2 commented 7 years ago

Sorry in advance because the minimal reprex is not that minimal: it involves Rcpp, and the issue I found is that some objects are not being garbage-collected because of magrittr (I'm adding @eddelbuettel to the loop, just in case I'm mistaken and this has something to do with Rcpp instead). Let's consider the following mini-module:

#include <Rcpp.h>
using namespace Rcpp;

class Foo {
public:
  ~Foo() { warning("foo destructor called"); }
  void add(Function fun) { funcs.push_back(fun); }
private:
  std::vector<Function> funcs;
};

//[[Rcpp::export]]
SEXP Foo__new() { return XPtr<Foo>(new Foo()); }

//[[Rcpp::export]]
SEXP Foo__add(SEXP foo_, Function fun) {
  XPtr<Foo> foo(foo_);
  foo->add(fun);
  return foo_;
}

There is a class Foo that stores pointers to Bar, and Bar stores an R function, which is received through Foo::add. This code can be compiled with Rcpp::sourceCpp(code='<code_here>'). Then, let's play around:

foo <- Foo__new()
foo <- Foo__add(foo, function() 1)
rm(foo)
gc()
# Warning: foo destructor called

Foo's destructor is correctly called and we see the warning. Perfect. But now,

library(magrittr)

foo <- Foo__new() %>%
  Foo__add(function() 1)
rm(foo)
gc()

No warning! This is very serious, because the memory is not being deallocated. My guess is that magrittr could be messing with the environments and there is still a reference to foo somewhere, hidden, so that R is not removing the object. Is it possible?

Enchufa2 commented 7 years ago

Also note that it works correctly with a named function:

func <- function() 1

foo <- Foo__new() %>%
  Foo__add(func)
rm(foo)
gc()
# Warning: foo destructor called

I'm unable to replicate this with R6 classes or reg.finalizer().

kevinushey commented 7 years ago

In this call:

library(magrittr)

foo <- Foo__new() %>%
  Foo__add(function() 1)
rm(foo)
gc()

The parent environment of the function created through function() 1 appears to be not what one would expect -- e.g. I see

Browse[2]> environment(fun)
<environment: 0x110d74518>

rather than R_GlobalEnv. Doing some debugging, that appears to be the same environment associated with function_list[[k]](value) in the call stack.

Browse[2]> as.data.frame(cbind(sys.calls(), sys.frames()))
                                                   V1                         V2
1               Foo__new() %>% Foo__add(function() 1) <environment: 0x103d504a8>
2 withVisible(eval(quote(`_fseq`(`_lhs`)), env, env)) <environment: 0x110d74ce0>
3              eval(quote(`_fseq`(`_lhs`)), env, env) <environment: 0x110d74f10>
4                           eval(expr, envir, enclos) <environment: 0x103d50860>
5                                     `_fseq`(`_lhs`) <environment: 0x110d750d0>
6                    freduce(value, `_function_list`) <environment: 0x110d742e8>
7              withVisible(function_list[[k]](value)) <environment: 0x110d74438>
8                           function_list[[k]](value) <environment: 0x110d74518>
9                           Foo__add(., function() 1) <environment: 0x110d746d8>

My guess is that this ultimately comes down to something going wrong with how magrittr modifies the environment of functions it executes.

Enchufa2 commented 7 years ago

Moreover:

Browse[2]> ls(envir = environment(fun), all.names=TRUE)
[1] "."
Browse[2]> environment(fun)$.
<pointer: 0x3369e20>
Browse[2]> foo_
<pointer: 0x3369e20>

There it is the other copy of the pointer! That's why the destructor is not being called. Thanks, @kevinushey, for your help.

smbache commented 7 years ago

Is it the same issue with the update branch? Has a simpler approach...

Enchufa2 commented 7 years ago

Just checked and yes, same issue.

smbache commented 7 years ago

Hmm it's not clear to me what's going on - one of the Rcpp experts may know... @romainfrancois ?

Enchufa2 commented 7 years ago

The gc destroys an object when it loses all the references to it. In the first and third examples (no pipe or named function), the function's environment is the R_GlobalEnv. Therefore, you remove foo and the gc destroys the object. In the second one (pipe + anonymous function), the function's environment is a new one created by magrittr, and this environment stores another reference to the object. The function is stored inside the object. Thus, when you remove foo, there is this other reference alive, and the object cannot be destroyed.

eddelbuettel commented 7 years ago

Seems to be an unhappy intersection of @Enchufa2 having code that would really like the object to be destroyed so that gc() is called, and the non-standard approach to evaluating things requiring another copy preventing exactly this.

Sometimes one can fix the data type to add an explicit 'I am done, please unwind' member function to manually do some of the work of the destructor (at the cost of some local code complexity) -- but I am not sure one could do that here with Modules.

Enchufa2 commented 7 years ago

Seems to be an unhappy intersection of @Enchufa2 having code that would really like the object to be destroyed

Yes, basically, I have users' reports about machines running out of memory after many replications of a simulation model with simmer. I recommend them to reset the simulation environment, calling explicitly a function that deallocates resources, but I cannot rely on that.

Nevertheless, knowing what's the problem, I think I can implement in my package a workaround to transparently remove the reference that magrittr creates until this issue is solved.

Enchufa2 commented 7 years ago

Ok, I finally managed to find an example not involving Rcpp (sorry, guys, and thanks for your help). Consider the following not-so-useful function:

store <- function(env1, env2, func) {
  env2$func <- func
  env1
}

Then, as expected,

a <- new.env()
b <- new.env()
reg.finalizer(a, function(e) print("deleting 'a'"))
a <- store(a, b, function() 1)
rm(a); gc()
# deleting 'a'
rm(b); gc()

but

library(magrittr)

a <- new.env()
b <- new.env()
reg.finalizer(a, function(e) print("deleting 'a'"))
a <- a %>% store(b, function() 1)
rm(a); gc()
rm(b); gc()
# deleting 'a'

That's not ok. Finally, a workaround:

library(magrittr)

a <- new.env()
b <- new.env()
reg.finalizer(a, function(e) print("deleting 'a'"))
a <- a %>% store(b, function() 1)
rm(".", envir=environment(b$func)) # the trick
rm(a); gc()
# deleting 'a'
rm(b); gc()

That's ok again.