vegandevs / vegan

R package for community ecologists: popular ordination methods, ecological null models & diversity analysis
https://vegandevs.github.io/vegan/
GNU General Public License v2.0
454 stars 98 forks source link

Conflict with rda object (Regularized Discriminant Analysis) in klaR #277

Closed brendanf closed 6 years ago

brendanf commented 6 years ago

Package klaR (which is loaded by package agricolae) defines a class rda with S3 methods print.rda() and plot.rda(). Even if klaR is not attached, it still registers the S3 methods when loaded. vegan's rda class inherits plot.cca() and print.cca(), but these are non-exported and so cannot be called directly. Furthermore, plot() is called by ordiplot(), and this dispatches incorrectly on objects of type rda, leading to an error.

Suggested fix: create plot.rda and print.rda as aliases for plot.cca and print.rda, and register the S3 methods.

MnWE:

library(agricolae)
library(vegan)
## example from cca/rda help file
data(dune)
data(dune.env)
dune.Manure <- rda(dune ~ Manure, dune.env)
print(dune.Manure)
plot(dune.Manure) 
ordiplot(dune.Manure)
brendanf commented 6 years ago
diff --git a/R/plot.cca.R b/R/plot.cca.R
index 7c7130f..cd17643 100644
--- a/R/plot.cca.R
+++ b/R/plot.cca.R
@@ -132,3 +132,6 @@
     class(g) <- "ordiplot"
     invisible(g)
 }
+
+`plot.rda` <-
+  plot.cca

diff --git a/R/print.cca.R b/R/print.cca.R
index 685ccca..afd6655 100644
--- a/R/print.cca.R
+++ b/R/print.cca.R
@@ -70,3 +70,5 @@
     }
     invisible(x)
 }
+
+`print.rda` <- print.cca

diff --git a/NAMESPACE b/NAMESPACE
index 35683ef..1efaf77 100644
--- a/NAMESPACE
+++ b/NAMESPACE
@@ -319,6 +319,7 @@ S3method(plot, rad)
 S3method(plot, radfit)
 S3method(plot, radfit.frame)
 S3method(plot, radline)
+S3method(plot, rda)
 S3method(plot, renyi)
 S3method(plot, renyiaccum)
 S3method(plot, spantree)
@@ -389,6 +390,7 @@ S3method(print, protest)
 S3method(print, radfit)
 S3method(print, radfit.frame)
 S3method(print, radline)
+S3method(print, rda)
 S3method(print, specaccum)
 S3method(print, simmat)
 S3method(print, simper)
jarioksa commented 6 years ago

Actually, vegan seems to work as documented. Including correctly dispatching the unmodified plotting data in ordiplot (issue #257).

jarioksa commented 6 years ago

issue #94 seems to concern the same name conflict.

jarioksa commented 6 years ago

It seems that klaR defines the following S3 methods for its "rda" class objects: print, plot and predict. All these will conflict with vegan. We can add (redundant) vegan:::print.rda and vegan:::plot.rda that we do not currently have, but vegan:::predict.rda and klaR:::predict.rda will remain.

This is a similar issue as we had with ade4:::cca which was only solved after ade4 dropped its cca function. During the name conflict time, we had function vegan::ade2vegancca function to catch the cases when an ade4::cca object was input to vegan functions. Adding redundant vegan:::print.rda and vegan:::plot.rda functions will solve some problems, but not all -- what it solved depends on the order vegan and klaR are loaded, and which "rda" object is handled, and in no case will it solve the problems with predict.rda.

All we can do is to make vegan functions aware of klaR functions and issue a warning. It seems that vegan::rda and klaR::rda cannot be used together and their result objects cannot exist simultaneously.

brendanf commented 6 years ago

In my particular case, I'm not interested in actually using klaR::rda; the S3 methods only get loaded because I want unrelated functions from agricolae. Adding the redundant functions in vegan would solve that issue (as long as I load vegan after agricolae, as one normally has to do in name conflicts).

jarioksa commented 6 years ago

I think I have now solved the issues as well as they can be solved. There is a genuine name conflict and a completely satisfactory solution cannot be reached. The changes are now in issue-#277 branch ready to be merged.

The changes are:

If nothing strange appears, this will be merged within 24 hours.

jarioksa commented 6 years ago

The situation seems to be more complicated than I assumed. The changes I made solved the problem in my macOS laptop with R 3.4.4 when vegan was loaded after klaR and vegan::rda was preferred. However, they did not help in my Linux desktop at work, but failed both with stock R 3.4.2 and current development version (2018-05-05 r74699). So the situation is unresolved, and I won't merge the changes.

A simple test procedure is:

library(klaR)
library(vegan) # load after klaR
data(dune, dune.env)
rda(dune) # print worked in macOS, failed in Linux
rda(dune ~ ., dune.env) # print worked in macOS, failed in Linux

Here failling means that the function used klaR:::print.rda, working that it used vegan:::print.rda.

brendanf commented 6 years ago

It works correctly for me in Linux, R 3.3.2.

jarioksa commented 6 years ago

Yes indeed. The situation is still more complicated than I assumed: the working depends on the directory used when launching the command, and when started as R --vanilla (i.e., without reading Workspace), the print commands worked like I assumed. Anybody knows what is happening?

However, this means that the changes may mitigate the problem at least occasionally. I'll ponder that for a while before deciding what to do with the merge.

brendanf commented 6 years ago

It appears that the change introduces a problem in ordiplot.

library(vegan)
data(dune, dune.env)
r <- rda(dune ~ ., dune.env)
ordiplot(r, display = "species")
#Error in list(1, slam, sqrt(slam))[[abs(scaling)]] : 
#  subscript out of bounds
#In addition: Warning message:
#In if (scaling) { :
#  the condition has length > 1 and only the first element will be used

It looks like this is because ordiplot calls plot() with choices as an unnamed argument, and after passing through plot.rda, it is plugged into scaling instead of choices, where it is inappropriate.

This seems to fix it for me:

diff --git a/R/ordiplot.R b/R/ordiplot.R
index f3dda10..36e663b 100644
--- a/R/ordiplot.R
+++ b/R/ordiplot.R
@@ -10,9 +10,9 @@
     if (!is.null(attr(ord, "class")) && (class(ord) == "decorana" ||
                                          any(class(ord) == "cca"))) {
         if (missing(display))
-            out <- plot(ord, choices, type = type, xlim = xlim,
+            out <- plot(ord, choices = choices, type = type, xlim = xlim,
                         ylim = ylim, ...)
-        else out <- plot(ord, choices, type = type, display = display,
+        else out <- plot(ord, choices = choices, type = type, display = display,
                          xlim = xlim, ylim = ylim, ...)
     }
     else {
jarioksa commented 6 years ago

I think I have figured out what was going on with my failure to fix klaR::rda conflicts: vegan Namespace was loaded when starting the session and reading the Workspace. Obvious and reproducible reason was that the saved Worspace had a vegan::nullmodel environment and this triggered loading vegan Namespace at the start-up. When vegan was attached with library(), its Namespace was not re-loaded and so it was factually loaded before klaR Namespace and klaR methods took the preference.

To reproduce this, start with empty Worskpace. Mere R --vanilla won't help, but the directory must be empty (no .RData file to be read), because we want to see what happens after reloading the Workspace. With this empty environment, the following works:

sessionInfo()
## gives among other items:
# loaded via a namespace (and not attached):
# [1] compiler_3.6.0
library(klaR)
library(vegan) # load after klaR
data(dune)
rda(dune) ## printed with vegan::print.rda
## Add an ecological nullmodel
nm <- nullmodel(dune, "curveball")
## quit saving the workspace
q("yes")

Now start a new session that loads the previously saved Workspace (.RData file) and repeat:

sessionInfo()
## gives among other items:
# loaded via a namespace (and not attached):
# [1] compiler_3.6.0  MASS_7.3-49     Matrix_1.2-14   parallel_3.6.0 
# [5] tools_3.6.0     mgcv_1.8-23     nlme_3.1-137    grid_3.6.0     
# [9] permute_0.9-4   vegan_2.6-0     cluster_2.0.7-1 lattice_0.20-35
library(klaR)
library(vegan) # load after klaR
data(dune)
rda(dune) ## printed with klaR::print.rda
#Call: 
#rda(X = dune)
#
#Regularization parameters: 
#[1] "this is a vegan::rda result object"
#
#Prior probabilities of groups: 
#NULL
#
#Misclassification rate: 
#       apparent:  %

In the list of attached packages vegan comes before klaR (like it should), but it seems that its Namespace was loaded before klaR Namespace, and attaching a package does not change this.

I think that the design decision to make nullmodel() return an environment leads to loading vegan Namespace (and Namespaces of a whole lot of other packages). What do you think @psolymos ?

jarioksa commented 6 years ago

@brendanf : I confirm the ordiplot issue and that the fix works. To set the author of the commit, I need either your ID in the form accepted by git merge --author which now gives me error

fatal: --author 'brendanf' is not 'Name <email>' and matches no existing author

or alternative a Pull Request in github (against issue-#277 branch) or your confirmation that you don't want to be marked as the author of the commit.

jarioksa commented 6 years ago

@brendanf thanks for the PR (#278). Now merged. I think I'm ready to merge this back to master, but this is one issue that has surprised me earlier, and therefore I'll wait (again) for 24 hours before doing anything.

psolymos commented 6 years ago

@jarioksa : quite interesting consequences. Other functions reference nullmodel objects as lists via the $ accessor. I wonder whether changing to

...
#    out <- list2env(out, parent=emptyenv())
#    class(out) <- c("nullmodel", "environment")
    class(out) <- "nullmodel"
...

would solve the issue. I will check/test.

psolymos commented 6 years ago

I looked at changing the nullmodel object to list instead of environment in my local copy (and in the meantime forgot to rebase my local repo and created an ugly looking merge...). Most things work fine. There is one fundamental issue though. The following code from the ?nullmodel example runs fine based on environment, but fails with list:

nm <- nullmodel(x, "swap")
nm <- update(nm, nsim=10)
sm1 <- simulate(nm, nsim=10, thin=5)
sm2 <- simulate(nm, nsim=20, thin=5)
sm3 <- simulate(nm, nsim=10, thin=5)
smbind(sm3, sm2, sm1, MARGIN=3)

That is because we keep updating the state of the objects inside of the environment ($state and $iter), which means that the input object nm reflect those different states for each simulate call. This feature is important for sequential models. I can't see an easy way around this, because simulate returns the simmat object, whereas the nullmodel object represents the latest state (and the original input).

It seems that for this reason, the environment was the right choice. Thoughts?

psolymos commented 6 years ago

See changes in branch issue-#277-nullmodel-as-list.

jarioksa commented 6 years ago

@psolymos : yes, that it is. I was sure that we did not have it as a list just to be fancy, and that was the point. I remember we also inspected more radical solutions (like S4-classes), but updating sequential models was the key point. In that time we had a lot of trouble in simpler updates (anova.cca and its ilk), and there the solution has been to find other tools, and we have get rid of most of update. However, I'll have a look at your new branch. Tomorrow probably (which is a public holiday here in Finland).

jarioksa commented 6 years ago

Having nullmodel objects as environments brings along surprising features in R, and we may reconsider the design at some stage. Currently I find it very difficult to think about a clean alternative design (superassignment <<- or assign(..., envir=...) could perhaps be hacked to work, mostly). However, this needs more thinking and klaR name clash is not as important an issue that we should therefore re-design nullmodel (the namespace loading may be, but that is an entirely different issue with obscure side effects). Thanks for providing the test case in a separate branch, @psolymos . This again nicely demonstrates that most things are not simple and isolated.

Very little can be done with vegan::rda and klaR::rda name clash. I don't think I'm going to change the name of rda function name which has been around since 2003 and is widely used, and I don't think Uwe Ligges is going to change the name of his rda. It is also very logical and consistent: there are already functions lda (linear discriminant analysis) and qda (quadratic DA), and his rda (regularized DA) unifies these two plus more. All we can do is to mitigate some problems when these two are unintentionally attached to the same session, but radical redesign of completely unrelated nullmodel is not a thing we will do for this issue. I was just baffled for contrasting results that I first assumed were caused by different platforms.

jarioksa commented 6 years ago

Closed with commit 8f02d8bff3c3ccb2661f35e2e3121391cb402945