markmfredrickson / optmatch

Functions for optimal matching in R
https://markmfredrickson.github.io/optmatch
Other
47 stars 14 forks source link

UseMatch and match.call errors #234

Closed josherrickson closed 10 months ago

josherrickson commented 11 months ago

I received an email from Luke Tierney that reads in part:

UseMethod has since the beginning had the 'feature' that local variables in the generic are added to the environment in which the method body is evaluated. This is documented in ?UseMethod and R-lang.texi, but use of this 'feature' has been explicitly discouraged in R-lang.texi for many years.

This is an unfortunate design decision for a number of reasons (see below), so the plan is to remove this 'feature' in the next major release.

Fortunately only a small number of packages on CRAN (see below) seem to make use of this feature directly; a few more as reverse dependencies. The maintainers of the directly affected packages will be notified separately.

Current R-devel allows you to set the environment variable R_USEMETHOD_FORWARD_LOCALS=none to run R without this feature or R_USEMETHOD_FORWARD_LOCALS=error to signal an error when a forwarded variable's value is used.

Some more details:

An example:

 > foo <- function(x) { yyy <- 77; UseMethod("foo") }
 > foo.bar <- function(x) yyy
 > foo(structure(1, class = "bar"))
 [1] 77

Some reasons the design is a bad idea:

  • You can't determine what a method does without knowing what the generic it will be called from looks like.

  • Code analysis (codetools, the compiler) can't analyze method code reliably.

  • You can't debug a method on its own. For the foo() example,

    foo.bar(structure(1, class = "bar")) Error in foo.bar(structure(1, class = "bar")) : object 'yyy' not found

  • A method relying on these variables won't work when reached via NextMethod:

    foo.baz <- function(x) NextMethod("foo") foo(structure(2, class = c("baz", "bar"))) Error in foo.bar(structure(2, class = c("baz", "bar"))) : object 'yyy' not found

Funnily enough, we're one of 8 packages on CRAN using this "feature".

Our usage is related to the call - for example, in R/match_on.R, we store the call to match_on() as an object prior to calling UseMethod to dispatch on the first argument:

https://github.com/markmfredrickson/optmatch/blob/ea51cf6bbe29ec88909fcf16824adb7ce6bc2e3e/R/match_on.R#L78-L79

which we then refer to later inside e.g. match_on.numeric():

https://github.com/markmfredrickson/optmatch/blob/ea51cf6bbe29ec88909fcf16824adb7ce6bc2e3e/R/match_on.R#L291

so that the call attached to the output is from match_on, not whatever it dispatched to.

I've only spent a few minutes looking into this but I haven't found a really satisfactory solution yet, other than futzing in the call stack to identify the original match_on() call.

  1. Does anyone have an easier solution here?
  2. I have to imagine there's some other R functions which work similarly but I'm drawing a blank at identifying any.
  3. Can we just ditch this? Do we need the call? Is there a reason it can't be the match_on.___() version?

We do a similar maneuver in fullmatch and pairmatch.

josherrickson commented 10 months ago

I should add - this will become a CRAN problem for the next major release (4.4 or 5.0 if they jump), not the point release coming out next week.

markmfredrickson commented 10 months ago

I played around with this a little and didn't find a way to cleanly tell if we were in a "UseMethod" invocation and a direct invocation. We can, however, force the call to use a specific function instead of whatever match.call says is the function:

> f <- function(x, y) { UseMethod("f") }

> f.numeric <- function(x, y) { match.call() }

> f.character <- function(x, y) {
+   cl <- match.call()
+   cl[1] <- str2lang("f()")
+   return(cl)
+ }

> f(12, 1)
f.numeric(x = 12, y = 1)

> f("char", 2)
f(x = "char", y = 2)

This seems a little hacky, but wouldn't be too heavy handed.

josherrickson commented 10 months ago

Doesn't this defeat the purpose of storing the call? As it is now, it should capture the actual call the user passed down. If we use this we lose if the user calls match_on.numeric directly. Do we care? Does it matter if we just saved match_on.____?

Is scores the only reason we store the call? https://github.com/markmfredrickson/optmatch/blob/c92a275ee2afb3578f10fbe21b046b111fc64578/R/scores.R#L75-L78

It looks like we could just stick with the dispatched match_on.____ function and it would work just fine.

benthestatistician commented 10 months ago

From what I can see it seems we'd be fine if we did away with the practice of storing the actual user call, as Josh suggests in point 3 of the OP. The code base suggests we just fell into the habit of handing calls to generics in this way, as a sort of local best practice. In a context with the language architecture that R core is now reversing, I think it is a best practice; but I think we'll be fine without it. When we do need to re-run the call, to me it seems safe enough to invoke whatever method of match_on(), mdist(), pairmatch() or fullmatch() had been dispatched, rather than the original generic. (Those are all the functions where we're saving calls in this way; I don't believe scores() is implicated.)

I may be mistaken about this, in which case it's likely that our existing tests would uncover the problems. We might also hear about problems later from users, of course. Either way, we'd then have Mark's observation in our back pocket, as a starting point for fixes. Perhaps that's the spirit in which he meant it.

markmfredrickson commented 10 months ago

Perhaps that's the spirit in which he meant it.

Yes. I don't have strong feelings that this is a good idea, but in case someone felt strongly that it should be the generic, we could do that. I'm happy to have the specific method in the call.

josherrickson commented 10 months ago

I did find a work-around. We'd like to be able to do this:

f <- function(x) {
  out <- UseMethod("f")
  print("do call stuff here")
  return(out)
}

f.numeric <- function(x) {
  return(1)
}

but this doesn't work as it seems like UseMethod include an implicit return:

> f(1)
[1] 1

So instead we have an internal function:

g <- function(x) {
  .g <- function(x) {
    UseMethod("g")
  }
  out <- .g(x)
  print("do call stuff here")
  return(out)
}

g.numeric <- function(x) {
  return(1)
}

which does work:

> g(1)
[1] "do call stuff here"
[1] 1

That said, I like the idea of simplifying and just returning the dispatched method call instead.

benthestatistician commented 10 months ago

Clever! I share your sense that the best answer for now is just to simplify, but good to have this here when we need it.

josherrickson commented 10 months ago

How do you want to handle pairmatch/fullmatch? Similar procedure?

benthestatistician commented 10 months ago

Yes, I'd say same procedure (let the call now be specific to the method that was dispatched). Good to see that the match_on changes required only minor modifications of the tests.

josherrickson commented 10 months ago

This should be done. I have no way to test it, so just gonna hope for the best - at least we have an easy path forward if I missed something.