r-lib / R6

Encapsulated object-oriented programming for R
https://R6.r-lib.org
Other
405 stars 56 forks source link

Finalizers, closes #92 #93

Closed gaborcsardi closed 7 years ago

gaborcsardi commented 7 years ago

Classes can define a public finalize method, and this will be called automatically when objects are garbage collected.

The destructor is called with no arguments, but of course it can refer to self and private, and if the class is not portable, then it can also just refer to members.

gaborcsardi commented 7 years ago

Btw. this PR also help accessing private members from the finalizer, which is quite hacky currently, as I can see you need to extract the private env manually: a$.__enclos_env__$private

gaborcsardi commented 7 years ago

PTAL.

wch commented 7 years ago

I just had a thought: What about inheritance? Say that class A has a finalizer, and class B inherits from A. What should happen then?

gaborcsardi commented 7 years ago

YEah, that's a good one.

gaborcsardi commented 7 years ago

I think the "usual" thing is sensible: the finalizer is still called, and you can override it in the child. It might be working like this already, although even then it is worth adding a test case.

wch commented 7 years ago

If A and B both have finalizers, I can imagine cases where B's finalizer should override A's, but I can also imagine some where both finalizers should run. In the latter case, could the finalizer invoke super$finalizer()?

And some tests of this would be nice :) (including tests with two levels of inheritance)

gaborcsardi commented 7 years ago

If A and B both have finalizers, I can imagine cases where B's finalizer should override A's, but I can also imagine some where both finalizers should run. In the latter case, could the finalizer invoke super$finalizer()?

Yes, I suspect that it already works like this, finalize is just a method like any other, after all.

And some tests of this would be nice :) (including tests with two levels of inheritance)

I'll write them in a minute.

gaborcsardi commented 7 years ago

@wch Btw. what did you have in mind for two levels? There are a lot of combinations to test with two levels....

wch commented 7 years ago

I've been thinking about it some more, and I think I've convinced myself that B's finalizer should have to explicitly call super$finalize() if it wants to invoke A's finalizer.

Here's an example of that illustrates the other side: why it can be awkward for B to have to invoke A's finalizer explicitly. Suppose A is a class that represents a database connection with a finalize method that closes the connection. And B is a subclass that buffers the reads (or something like that). That means that B (and every other subclass of A) will need to have a finalize that calls super$finalize(), even if B's finalizer does nothing else. And it's also possible to create a subclass of A which has broken behavior -- if the subclass has no finalize, it will leave the connection open.

On the other hand, if B's initializer wants to invoke A's initializer, the it needs to call super$initialize(). This is the current behavior. I think it makes sense for finalizers to have behavior that mirrors this.

wch commented 7 years ago

For two levels: suppose C inherits from B, which inherits from A. Then if C's finalize calls super$finalize(), it should invoke B's finalize. And if B's finalize calls super$finalize(), it should invoke A's finalize.

I don't think it makes sense for C's finalize to be able to call A's finalize directly. Did you have any other combinations in mind?

gaborcsardi commented 7 years ago

TBH, I am not sure what you prefer now. :) Sorry.

Right now

I think this is all quite good. It is safe and flexible. In your DB example, B does not have to define a finalizer just to call A's finalizer. OTH if it defines one to do sg extra, it can still call the parent's finalizer via super.

I'm almost done with the test cases, 5 more minutes....

gaborcsardi commented 7 years ago

For two levels: suppose C inherits from B, which inherits from A. Then if C's finalize calls super$finalize(), it should invoke B's finalize. And if B's finalize calls super$finalize(), it should invoke A's finalize.

Got it, so all three have finalizers and we destroy an object from the grandchild.

wch commented 7 years ago

Oh right, I was confused -- I forgot that B would just inherit A's finalizer if it didn't define one. I agree that what you outlined sounds good, and I'm glad it just works that way!

gaborcsardi commented 7 years ago

Never mind, it was good to think this through. And also, to write the test cases, which are done btw.

As for two levels of inheritance, you could test the case when B does not define a finalizer, only A and C, etc. so there are a lot of cases.

Hmmm, this reminds me, what if we call super$finalize() and the superclass does not have a finalize method? I'll check this out in a minute.

gaborcsardi commented 7 years ago

Hmmm, this reminds me, what if we call super$finalize() and the superclass does not have a finalize method? I'll check this out in a minute.

OK, this fails, as expected. On the other hand it also fails for initialize, so maybe we want to fix them both at the same place?

I think it is good defensive programming to just call super$initialize and super$finalize, especially if another package (and person) provides the superclass, so ideally these should always succeed. Although

if (!is.null(super$finalize)) super$finalize()

is a simple workaround, so probably not a big deal.

gaborcsardi commented 7 years ago

Wait a sec, I am fixing some test labels.

gaborcsardi commented 7 years ago

OK, should be all good now.

Maybe we could generate the non-portable test cases from the portable ones, because it is easy to make copy-and-paste errors. :)

wch commented 7 years ago

Looks good!

HenrikBengtsson commented 7 years ago

Just a quick comment sharing my experiences with "automagic" finalizers:

Make sure to test that the finalizers are re-entrant. There's a risk that finalizers depend on code that may not be available when R runs it. For instance, an object may still be around, but all package code may have been unloaded when it's finalizer runs. This becomes especially complicated when they run when R itself exits, cf. from help(".Last"):

Exactly what happens at termination of an R session depends on the platform and GUI interface in use. A typical sequence is to run .Last() and .Last.sys() (unless runLast is false), to save the workspace if requested (and in most cases also to save the session history: see savehistory), then run any finalizers (see reg.finalizer) that have been set to be run on exit, close all open graphics devices, remove the session temporary directory and print any remaining warnings (e.g., from .Last() and device closure).

Then there's the following comment from help("reg.finalizer"):

R's interpreter is not re-entrant and the finalizer could be run in the middle of a computation. So there are many functions which it is potentially unsafe to call from f: one example which caused trouble is options. As from R 3.0.3 finalizers are scheduled at garbage collection but only run at a relatively safe time thereafter.

Problems often shows up as non-reproducible sporadic errors that are quite hard to troubleshoot, because how and when finalizers are run is hard to predict.

I went through the above a while ago with R.oo - required lots of trial-and-error development and very defensive coding.

Just my $.02

gaborcsardi commented 7 years ago

@HenrikBengtsson Thanks, this is very useful! Maybe it could be even in the docs.

gaborcsardi commented 7 years ago

@HenrikBengtsson Reading now the R.oo code, wow!

HenrikBengtsson commented 7 years ago

Don't look to close; that piece of code is ad hoc and patchy - it's an instance where "if it works - don't change it" truly applies.

gaborcsardi commented 7 years ago

@HenrikBengtsson What I am thinking about is that R6 objects are pretty self-contained, and usually do not need anything from the R6 package.

What might happen is that an R6 class needs its (package) environment to work. E.g. my process class needs the processx package environment. If this environment is not around after gc, that indeed restricts what we can run in the finalizer.

I'll experiment with this.

wch commented 7 years ago

@HenrikBengtsson Very interesting, thanks for sharing your experiences. These do sound like really tricky problems to diagnose.

FWIW, when I run this code in an R session:

library(R6)
print(loadedNamespaces())

e <- new.env()
reg.finalizer(e, onexit = TRUE, function(e) {
  cat("==== Finalizer ====\n")
  print(loadedNamespaces())
})

.Last <- function() {
  cat("==== .Last ====\n")
  print(loadedNamespaces())
}
q()

It doesn't unload any of the packages before .Last() or e's finalizer are called. (Tested on R 3.2.3 on a Mac.)

$ R
> library(R6)
> print(loadedNamespaces())
[1] "R6"        "graphics"  "utils"     "grDevices" "stats"     "datasets" 
[7] "methods"   "base"     
> 
> e <- new.env()
> reg.finalizer(e, onexit = TRUE, function(e) {
+   cat("==== Finalizer ====\n")
+   print(loadedNamespaces())
+ })
NULL
> 
> .Last <- function() {
+   cat("==== .Last ====\n")
+   print(loadedNamespaces())
+ }
> q()
==== .Last ====
[1] "R6"        "graphics"  "utils"     "grDevices" "stats"     "datasets" 
[7] "methods"   "base"     
==== Finalizer ====
[1] "R6"        "graphics"  "utils"     "grDevices" "stats"     "datasets" 
[7] "methods"   "base"