vegandevs / vegan

R package for community ecologists: popular ordination methods, ecological null models & diversity analysis
https://vegandevs.github.io/vegan/
GNU General Public License v2.0
456 stars 98 forks source link

rarify breaks between version 2.6-2 and 2.6-4 #579

Closed jkeuskamp closed 1 year ago

jkeuskamp commented 1 year ago

Possible regression found:

when using vegan::rarify (v2.6-4) on an phyloiseq@otu-table object , it returns 'Error in as(x, "matrix")[i, j, drop = FALSE] : (subscript) logical subscript too long', while vegan::rarify (v2.6-2) ran without problems

gavinsimpson commented 1 year ago

Have you got a reproducible example? Neither Jari or I typically work with the sorts of data that would involve phyloseq.

gavinsimpson commented 1 year ago

The only change of note is this addition as that's the only thing that changed between 2.6-2 and 2.6-4 from what I can tell, but that looks fine, and the error you quote seems to implicate phyloseq not vegan as the error relates to S4 methods, which we don't use and clearly phyloseq does. Perhaps try without directly accessing the slot in the phyloseq object and instead use an extractor?

jkeuskamp commented 1 year ago

when importing the attached RDS as x:

vegan::rarefy(x,2) did work in v2.6-2 but doesn't in v2.6-4.

However, following your suggestion and using

vegan::rarefy(as.data.frame(x),2)

does also work in v2.6-4, which solves the problem for me. mydata.RDS.zip

jarioksa commented 1 year ago

This looks as the same as problem as reported in stackoverflow for rarecurve. This was due to changes in phyloseq which uses non-standard structures that cannot be handled with as.matrix. I see this as a problem in phyloseq, and I wish they fix their internal structures so that they can be handled as matrices.

jkeuskamp commented 1 year ago

While phyloseq using non-standard structures may be the underlying cause, I have confirmed that it did work in 2.6-2, and stopped working after only updating vegan to updating to 2.6-4.

From: Jari Oksanen @.> Date: Wednesday, 21 June 2023 at 18:36 To: vegandevs/vegan @.> Cc: Keuskamp, J.A. (Joost) @.>, Author @.> Subject: Re: [vegandevs/vegan] rarify breaks between version 2.6-2 and 2.6-4 (Issue #579)

This looks as the same as problem as reported in stackoverflow for rarecurvehttps://stackoverflow.com/questions/74530039/error-vegan-rarecurve-error-in-asx-matrixi-j-drop-false-subscript. This was due to changes in phyloseq which uses non-standard structures that cannot be handled with as.matrix.

— Reply to this email directly, view it on GitHubhttps://github.com/vegandevs/vegan/issues/579#issuecomment-1601297874, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AAHKBHDOGGEEEP7SP2CEP7DXMMWILANCNFSM6AAAAAAZOPH3KY. You are receiving this because you authored the thread.Message ID: @.***>

jarioksa commented 1 year ago

I guess it stopped working after parliamentary elections in Finland, but that is also unrelated to vegan. The only change in the function rarefy is this:

diff --git a/R/rarefy.R b/R/rarefy.R
index 3c5fdf5..56fcb44 100644
--- a/R/rarefy.R
+++ b/R/rarefy.R
@@ -7,6 +7,9 @@
         x <- t(x)
     if (!identical(all.equal(x, round(x)), TRUE))
         stop("function accepts only integers (counts)")
+    minobs <- min(x[x > 0])
+    if (minobs > 1)
+        warning(gettextf("most observed count data have counts 1, but smallest count is %d", minobs))
     minsample <- min(apply(x, MARGIN, sum))
     if (missing(sample)) {
         stop(

and this is completely unrelated to your case. Something else has changed, and that something else no longer works with vegan.

gavinsimpson commented 1 year ago

The error certainly happens with the line

minobs <- min(x[x > 0])

which is why we see the difference between the versions. But there is nothing wrong with this if x is a matrix and something about a phyloseq object or it's methods for coercing said class to matrix or subsetting it (via [ methods) means your object isn't working correctly. But that is a problem with phyloseq, not vegan. My guess right now is that the phyloseq developers defined methods for [ for their "otu_table" class and it doesn't understand the vector subsetting variant of [ for matrices.

If the issue is that you just accessed the slot containing the data, fix your code to not do that; you're supposed to use sample_data(foo) to extract the data from the phyloseq object IIRC.

Either way this isn't a problem in vegan, even if the change we made exposed a previous issue with your/their code.

gavinsimpson commented 1 year ago

Thanks for letting us know about this issue @jkeuskamp but I suggest you follow up with the phyloseq devs if the problem persists when you use their approved way of accessing the data from their object.

jarioksa commented 1 year ago

There seem be two ways of circumventing the new phyloseq feature:

  1. Force otu_table to be a matrix. Here base::as.matrix does not work, but you must forcibly change the class to matrix like suggested in the Stackoverflow discussion linked to above.
  2. First change otu_table to a standard data frame with base::as.data.frame like you suggested in your message. After this base::as.matrix works and then the indexing works in the standard way.

Naturally we could add any of these tricks to vegan, but I'm reluctant to do that just to circumvent design deficiencies in non-CRAN packages. For most users x <- as.matrix(as.data.frame(x)) would look strange – in particular when x was a matrix originally.

gavinsimpson commented 1 year ago

@jarioksa I agree that we shouldn't handle this problem for vegan — regardless whether it is a CRAN package or in another package repository. Apart from the extra developer effort required and ongoing maintenance if phyloseq changes again, there could be plenty of other places in vegan where such things fail, and if they fail in vegan, which uses standard R code, then there are surely problems with other R packages.

This should be fixed by the phyloseq developers so that the [ method understands vector indexing.

jarioksa commented 1 year ago

@gavinsimpson Actually it is not vector indexing: x > 0 for 2-dim matrix produces a logical matrix of same dimensions as x. It is only dropped to a vector after TRUE items are returned. My understanding in the Stackoverflow case was that phyloseq does not understand logical indexing (TRUE, FALSE), but it was a long ago, and I won't make a post mortem analysis now.

That said, quite a few users of rarefy & friends use phyloseq data, and the check was added there just to warn these people of common wrong usage of rarefaction methods. It was quite common to have user data where the smallest positive entry was in thousands because they had multiplied their data by huge numbers before getting it to rarefaction, and then these once-occurring OTUs in their arbitrary thousands could never be rarefied out. Another group with wrong data were mislead old-time community ecologists who believed the wrong and dangerous piece of advice that you should remove rare taxa that only occur once or twice – also making the data unsuitable for rarefaction (the task of rarefaction is to remove most of those rare species among some other species). I think the OTU people commit both sins: rare cases are first pruned out as read errors, and then data are multiplied by arbitrary numbers.

I also think that often the OTU tables are transposed (OTUs as rows, observations as columns) and then the results are even less meaningful. There is little we can do against this.

jarioksa commented 1 year ago

It seems that adding this simple function would solve some of the problems with phyloseq, and at least the problems with rarefaction functions:

as.matrix.otu_table <- function(x, ...) x@.Data

Transposed data are not handled, though.

gavinsimpson commented 1 year ago

Physloseq has an internal function veganifyOTU() which handles the transposition too, but I suspect that also uses further internal functions?

I don't think we should be including in vegan methods for classes that we don't control.

I also think that fixing one problem will inevitably result in further problems; what happens if the unwitting user uses the method you describe on otherwise fine transposed data? I suspect in many cases it would run without error resulting in garbage results that an unwitting user would be unaware of.

While trying to fix issues with phyloseq (or with their users' use of the package) might save use some bug reports or StackOverflow posts, I really don't think we should be responsible for problems (even if infelecities or user error) with other packages.

jarioksa commented 1 year ago

Fine! Agreed. I didn't know about phyloseq::veganifyOTU. So the standard answer should be "use phyloseq::veganifyOTU if you want to analyse phyloseq OTU tables in vegan".

gavinsimpson commented 1 year ago

They'd need phyloseq:::veganifyOTU as it isn't exported.