ropensci / RNeXML

Implementing semantically rich NeXML I/O in R
https://docs.ropensci.org/RNeXML
Other
13 stars 9 forks source link

Fixes duplicate S4 class name issues without changing class names #205

Closed hlapp closed 5 years ago

hlapp commented 5 years ago

It turns out that S4 classes defined with the same name but in different packages are actually supported in R. Hence, pseudo-namespacing class names as proposed in the #201 change set should in principle not be required. The mechanism for namespacing a class is in essence through the package attribute of the class name string. The function packageSlot() can be used to query and set the package attribute, and the function className() generates a class name namespaced in this way from its two arguments.

Therefore, this would allow us to keep the class names we want, not have to worry about adding and stripping the pseudo-namespace to/from the class name, and maintain full backward compatibility. All we would have to do is use new(className(name, "RNeXML")) instead of new(name).

However, that's unfortunately only the theory. In practice, two things are at least currently broken, preventing this to be sufficient to make it work:

This change set breaks down into three parts:

  1. Create a convenience function New() that ensures the class name is properly namespaced. This solves the problem of making invocation of new() unambiguous.
  2. Create a function that does the work of finding the next method that addNextMethod() should be doing instead, and caching the result in the calling environment in a way that callNextMethod() picks it up and then bypasses the invocation of addNextMethod().
  3. Create a function that finds the right method for an S4 generic, in the way standardGeneric() should be doing but doesn't in certain cases.

To be clear, as a full alternative to #201 this, too, fixes #169 and fixes #183. Unlike #201, it does not eliminate all messages of the Found more than one class "tree" in cache; using the first, from namespace 'cli'. Also defined by ‘RNeXML’ kind. However, so far these messages have not correlated with adverse events or results. It looks like they are being generated in the context of the call to initialize(), an S4 generic, which is at the end of the new() implementation in the methods package.

The other difference to #201 is that the protection from duplicated class names isn't sweeping. I have purposely applied it here only to the tree class, the one that presently happens to cause the issues. The protection could be applied on a sweeping basis, but the reasons I haven't (yet) are the following:

That said, not using namespace class names for invocations of new() could be considered a bug, and hence this (using New() as a convenience method) should arguably made as a sweeping change across the hierarchy.

hlapp commented 5 years ago

@cboettig I think with the additional tests this is pretty solid now. BTW your codecov targets are pretty ambitious 😮

cboettig commented 5 years ago

This is awesome, had no idea such a work-around would be feasible with the package slot. Does seem like a bug in the S4 system though that the core functions do respect this? Do you have a minimal reproducible example handy regarding the behavior of generics and the callNextMethod in this case? Wonder if it's worth filing a bug report in methods?

Otherwise, this looks great, so merge when ready.

I believe the coverage targets are merely codecov defaults, looks like we can adjust codecov.yml file to something more appropriate for the test suite, just haven't tried.

hlapp commented 5 years ago

Do you have a minimal reproducible example handy regarding the behavior of generics and the callNextMethod in this case?

Not really. I tracked it down by stepping through it in the debugging browser(), first under conditions under which the class name was not duplicated, then under conditions in which it was duplicated.

I was wondering how one could write a test case, perhaps through evaluation of a code snippet using two environments created from scratch. I'm not sure when I have time to give that a try.

To be clear, using the specific packages "cli" and "RNeXML", the problem can actually be reliably reproduced for addNextMethod():

library(methods)
library(RNeXML)
library(cli)
pkg <- asNamespace("RNeXML")
f <- getGeneric("toNeXML", where = pkg)
m <- selectMethod(f, signature = c(className("tree", "RNeXML"), "XMLInternalElementNode"))
mn <- addNextMethod(m, f = f, envir = pkg)
#> Found more than one class "tree" in cache; using the first, from namespace 'cli'
#> Also defined by ‘RNeXML’
#> Found more than one class "tree" in cache; using the first, from namespace 'cli'
#> Also defined by ‘RNeXML’
#> Error in validObject(.Object) : 
#>   invalid class “MethodWithNext” object: invalid object for slot "nextMethod" in class "MethodWithNext": got class "NULL", should be or extend class "PossibleMethod"

As you can see, there are no naked (lacking package namespace) "tree" class names in this snippet that could have been blamed.

Wonder if it's worth filing a bug report in methods?

Yes, but I'm not sure when I'll have the time. And because I don't want to expect everyone using the RNeXML package to have to be on the latest R point release, the workaround is good to have anyway, I think.

hlapp commented 5 years ago

OK I'll go ahead and merge ☺️

cboettig commented 5 years ago

👏 yay!

Meanwhile I'll ping some folks that frequently work with S4 methods and see if anything jumps out at them. But regardless this work-around does look like the way to go for our situation.