tidyverse / magrittr

Improve the readability of R code with the pipe
https://magrittr.tidyverse.org
Other
957 stars 157 forks source link

Pipe can prevent GC of object if S3 method is used #229

Open wch opened 3 years ago

wch commented 3 years ago

The pipe can prevent GC of an object that's passed into it. It happens when:

It seems that using the pipe causes there to be a reference to the object being piped

Here's an example where the use of %>% results in the object not being GC'd. However, if the function is called without the pipe, the object does get GC'd.

library(magrittr)
library(testthat)

f <- function(x, ...) {
  UseMethod("f")
}

f.default <- function(x, ...) {
  # Don't hold onto a reference to x
  rm(x)
  function() {
    123
  }
}

e <- new.env()
finalized <- FALSE
reg.finalizer(e, function(e) { finalized <<- TRUE; message("finalized") })

e1 <- e %>% f()
# If this line is used instead of the one above, then `gc()` collects `e` and finalizer runs.
# e1 <- f(e)

rm(e); gc(); gc()
expect_true(finalized)
#> Error: `finalized` isn't true.

# Clean up
rm(list=ls()); gc()
#> finalized
gaborcsardi commented 3 years ago

Remove all these symbols from e1 and then e will be GC-d:

❯ ls(environment(e1), all.names = TRUE)
[1] "..."             ".Class"          ".Generic"        ".GenericCallEnv"
[5] ".GenericDefEnv"  ".Group"          ".Method"

Actually, scratch that, it won't....

wch commented 3 years ago

Interesting -- if I add this to the inside of f.default, then e does get GC'd.

  rm(., envir = .GenericCallEnv)

Or you can run this outside of the function, after e1 has been created:

rm(., envir = environment(e1)$.GenericCallEnv)

But when I poke at it, I don't see how . holds a reference to e.

Run this after the code above (except for the rm(list=ls()); gc() at the end).

rls(environment(e1)$.GenericCallEnv$., all.names = TRUE)
#> [[1]]
#> character(0)
#>
#> [[2]]
#> [1] ".Random.seed" "e1"           "f"            "f.default"    "finalized"   

rm(., envir = environment(e1)$.GenericCallEnv)
gc()
#> finalized
expect_true(finalized)
gaborcsardi commented 3 years ago

But when I poke at it, I don't see how . holds a reference to e.

. is the piped data in the pipe, so I guess . is e in this case.

kevinushey commented 3 years ago

I concur with @gaborcsardi:

Screen Shot 2020-10-30 at 2 04 19 PM
> identical(environment(list(e1)[[1]])[[".GenericCallEnv"]][["."]], e)
[1] TRUE
wch commented 3 years ago

Oh, duh, that makes sense. I just didn't have any identifying information in e that made it easy to see that . and e were the same object.

lionel- commented 3 years ago

I think this is a duplicate of #146. It's the conjunction of two factors:

This issue will disappear if you use the base pipe in the next version of R because it simply transforms the parse tree to the nested form. If you add this to your reprex, to use the same transformation, the issue goes away:

# Needs dev version
`%>%` <- magrittr::pipe_nested

I think I'd rather avoid making ad hoc cleanups for the specific case of S3 dispatch within the pipe mask because it might cause unintended effects and we're about to release magrittr.

Enchufa2 commented 3 years ago

Yes, it's the same as #146. I have a workaround in place for one of my packages, which cleans up this reference, because this issue was causing OOM to my users.