joey711 / phyloseq

phyloseq is a set of classes, wrappers, and tools (in R) to make it easier to import, store, and analyze phylogenetic sequencing data; and to reproducibly share that data and analysis with others. See the phyloseq front page:
http://joey711.github.io/phyloseq/
584 stars 187 forks source link

Conflict between vegan and phyloseq? #918

Closed okayama1 closed 6 years ago

okayama1 commented 6 years ago

Hi there,

I have just updated my R version and packages and I am running into an error which I am not sure how to fix or why is it appearing. There seems to be a problem using "plyloseq::ordinate" with
distance = "bray" and method ="PCoA" (I did not run into this problem with earlier versions of the packages below).

Below a couple of reproducible examples with the issue:

1) data("GlobalPatterns") gp <- GlobalPatterns bray_dist <- phyloseq::distance(gp, "bray")

Error in .C("veg_distance", x = as.double(x), nr = N, nc = ncol(x), d = double(N * : "veg_distance" not available for .C() for package "vegan"

2) class2.ino_pcoa <- ordinate( physeq = gp, method = "PCoA", distance = "bray" )

Error in .C("veg_distance", x = as.double(x), nr = N, nc = ncol(x), d = double(N * : "veg_distance" not available for .C() for package "vegan"

3) All is fine if I run: class2.ino_pcoa <- ordinate( physeq = gp, method = "DPCoA", distance = "bray")

Any help or suggestions would be highly appreciate it! Many thanks.

okayama1 commented 6 years ago

One possible solution to the problem above:

I kept the latest R version that I am using (3.4.4) but I installed the vegan package version 2.4-5 (older version that I was using before my code broke) as follows:

install_version("vegan", version ="2.4-5", repos = "http://cran.us.r-project.org

joey711 commented 6 years ago

I'm not aware of a function/method called veg_distance.

It might be new, might be an internal function in vegan-package that I was not previously aware of. In any case, this bug appears to be derived by some change in vegan package implementation, and the error itself appears to surface back to a call to veg_distance...

@jarioksa

jarioksa commented 6 years ago

I don't have phyloseq in my laptop (cannot install it because it depends on too many packages that I cannot install). However, there is no .C() call to veg_distance in vegan 2.5-1 which is used in these cases (this issue, vegan issue vegandevs/vegan#272 and StackOverflow issue). Still phyloseq seems to find this call. It looks like phyloseq uses old vegan 2.4-x R code, but still has new vegan 2.5-1 compiled library. (It is either phyloseq or then user has a stray local copy of vegdist() that is used instead of the one in vegan 2.5-1.)

By some means phyloseq keeps old vegan R even after vegan is upgraded. Does it help to re-install phyloseq to refresh its view of vegan? There is a discrepancy between R code and compiled libraries when phyloseq issues its distance() call, and the R should be refreshed.

okayama1 commented 6 years ago

Thanks for the suggestion, I will give it a try and let you know how did it go.

jarioksa commented 6 years ago

@joey711 : it seems that phyloseq gets & saves the contents of vegan::vegdist() and keeps & uses them even when vegan::vegdist() changes.

I noticed that in my desktop where these examples worked smoothly with vegan_2.5-1, but failed when I downgraded to 2.4-6: I had installed phyloseq with vegan_2.5-0, and its internals were used. Looking at the point of failure, it really seems that in the first example of @okayama1 the error was triggered from do.call(dfun, fun.args), so for this case the saving happened in https://github.com/joey711/phyloseq/blob/master/R/distance-methods.R#L110

I don't know if this happen at installation or at building. If it is installation, re-installing phyloseq will help. If it is at building, binary packages for Windows and macOS probably should be built separately for vegan 2.5-x and for earlier versions.

joey711 commented 6 years ago

@jarioksa first of all, thanks for following up on this and reporting here as well as vegan issue tracker.

Other than phyloseq users being vocal about the problem, I'm having trouble understanding how this is a phyloseq issue? Or specific to phyloseq? It seems to be an issue with an update in vegan, and perhaps how various R systems are handling that update.

The implication that phyloseq is encapsulating copies of vegan low-level C or compiled C is a new one for me, if true. AFAIK, phyloseq in this instance is simply pointing to vegan methods as an explicit dependency, and the users' R installation has some legacy vegan methods getting dragged into it during execution.

I guess one might fairly ask how this is a vegan problem, and not simply an issue with how R package management system is handling updates?

As for pre-compiled builds, this is the kind of error that would be triggered in phyloseq build tests. The windows and mac binaries appear to be building fine on the Bioconductor build servers:

https://www.bioconductor.org/checkResults/

jarioksa commented 6 years ago

@joey711 "problem" is a social concept: users can have problems with software, but software actually has no problems (it can have bugs). This case is a temporary nuisance to users: it will crop out for phyloseq users when ever they upgrade only phyloseq or only vegan, but disappears as soon as both packages are back in the same version. However, it only appears with phyloseq commands: vegan is internally consistent. It could be socially useful have dependency on vegan (>= 2.5-1). This is not technically necessary as phyloseq would not change, but it would guarantee that when ever the users get consistent versions.

Please note that "some legacy vegan methods" are not getting dragged into execution, but that legacy function is included and bound to phyloseq: there is no other place where it can hide. I don't know where it is and how it is bound to phyloseq, but it seems to be there, and it seems to be there till the phyloseq is rebuilt with vegan 2.5-0 present in the build environment. Instead of calling the actual vegan::vegdist() like it is in the user environment, phyloseq will use its own private and static copy of vegdist. Is that because of having do.call(dfun, ...) or is it because of wrapping vegan functions into S4 in phyloseq -- I don't know. However, this static nature means that subtler inconsistencies can appear in the future and probably have appeared in the past.

The automatic build systems should compile fine, and both my tests and CRAN tests worked fine because there phyloseq was built with the same version of vegan that was used in testing. If that version is vegan 2.5-0, that is modern, but the results are just as fine and unchanged if it was vegan 2.4-x. It is the consistency of these two packages that matters, not the vegan version per se.

I stumbled with this problem yesterday in StackOverflow where this was reported as a vegan problem. When inspecting the issue, I got message on github report on vegan issue, and phyloseq was the common thing with these two. Then I found that the same problem was reported also in phyloseq, and with some search there was also another StackOverflow report which looks like the same issue. So it is an issue. Anyway, I'm closing this in vegan, because there is nothing we can do in vegan for this, but vegan works like it was intended to work.

joey711 commented 6 years ago

"legacy function is included and bound to phyloseq: there is no other place where it can hide"

Really? You've tracked down how the R package management system is relating to method dispatch and determined that in fact phyloseq has its own copies of vegan... I don't know very much about these details, but that would be an unusual choice for the package management architecture, requiring every package with dependencies to maintain copies of dependency's code... seems unlikely.

It sounds much more plausible that the namespace pointer from the installed phyloseq version is corrupted by the vegan update, pointing to an old version of the vegan function that is now missing or broken or retained in the user's R installation in some fashion. This would not be a bug of phyloseq or vegan. A point-in-case would be that any other package depending on vegan::vegdist would probably reproduce the same bug. Hence, not specific to phyloseq.

In any case, I think we agree that this is not an issue of either package, but an odd bug with how the package management system is handling package updates in the context of package-to-package dependencies.

I will make sure to "up-tick" the required version of vegan in phyloseq DESCRIPTION file, to help guide users around this problem, and also close this issue for some of the reasons you stated.

jarioksa commented 6 years ago

@joey711 : it seems very much that the old vegdist code lurks here:

library(phyloseq)
getMethod("vegdist")
## which may be different from
vegan::vegdist
## and was different when these issues were reported

The phyloseq instance of vegdist was created in R/extend-vegan.R lines 112 to 134 with setGeneric("vegdist") and two cases of setMethod("vegdist"). These seem to be created when phyloseq is built and the binary package includes the then current vegan::vegdist code, but do not change when vegan changes.

joey711 commented 6 years ago

fair enough. but those are merely phyloseq wrapper to accomplish S4 dispatch within phyloseq internals, transforming data structures to a proper call to vegan::vegdist(), the guts of which are not copied into phyloseq source.

Perhaps I could refactor these internal wrapper names or that dispatch to avoid confusion, but I'm guessing the error would still arise on a user's system when the vegan::vegdist() low-level has changed due to package update, in the absence of a clean build. Right? Or is there some best-practice I'm failing at?

jarioksa commented 6 years ago

They are not copied to phyloseq sources, but they are copied to phyloseq binary builds. That means that they are copied to binary packages of Windows and macOS users. Those who install from source packages will get the version of their build time, but that does not change when you upgrade vegan to 2.5-x and that will give you an error.

I checked a couple of hours ago, and the Bioconductor macOS binary (.tgz) still got the 2.4-6 generic binary. In binary build, that is included in R/phyloseq.rdb as environment .__T__vegdist:vegan which contains vegan 2.4-6 version of vegdist that will trigger an error after vegan is upgraded to 2.5-1. These instances are generated when a binary build is made using the version of vegan that was installed during that binary build, and that will be permanent. If you are installing from the source, it is OK at the build time, but will break after you upgrade vegan. Please see it yourself.

Here one way of checking this:

Start R with an empty environment. In command line (in unix) this would happen with R --vanilla. Then you should have:

ls(all.names=TRUE)
## should be character(0)

## Read the contents of package data base without loading anything -- no vegan, no phyloseq
lazyLoad(file.path(find.package("phyloseq"), "R", "phyloseq"))
ls(patt="vegdist", all.names=TRUE)
## contains:
## [1] ".__T__vegdist:vegan" "vegdist" 
## vegdist is your defined generic function, and .__T__vegdist:vegan contains vegan::vegdist
## of build time. It is an environment, and to see its contents you need
as.list(`.__T__vegdist:vegan`)
## The function is in element $ANY

This function in .__T__vegdist:vegan will be used as a generic instead of vegan::vegdist, and this will trigger an error after vegan is upgraded.

I do think this is even documented, although it is so dense prose that it is really hard to understand unless you know what to expect. From ?setGeneric:

A generic version of the function will be created in the current package. The existing function becomes the default method, and the package slot of the new generic function is set to the location of the original function

I have never used S4 objects because they make my head ache. Now I know more about them than I ever cared to know. I did not catch this, because in tools::check_packages_in_dir() installs & checks the dependent packages with the new version vegan, and then the packages are synchronized. Would I have seen this, I certainly would have tried to fix the issue before the vegan release. Now the solutions is to "uptick" (is that the correct word?) the vegan dependency so that Bioconductor will have new binary builds with vegan_2.5-1 and take care that vegan is updated together with phyloseq update.

joey711 commented 6 years ago

Thanks for that detail. Very helpful. Not a bug I would have guessed. It doesn't seem that either one of us broke the implicit "contract" between exported vegan methods and packages that rely on them.

I'll make sure to uptick the vegan version now.

jarioksa commented 6 years ago

Agree. We both worked in good faith as decent R citizens. This was a surprise to me, and I had hard time to understand what happened.

The most robust solution would be to make vegdist as a S4 generic in vegan even if we define no S4 methods there.

sfeds commented 6 years ago

Hi all, wonderful to read you, even if my knowledge on this is very limited. So, in summary, for the average user, what is the solution? I still get this error message.

As a recap:

library("vegan") ; packageVersion("vegan") [1] ‘2.5.1’ library("phyloseq") ; packageVersion("phyloseq") [1] ‘1.22.3’

GPdist <- phyloseq::distance(pseq_ra, "bray") Error in .C("veg_distance", x = as.double(x), nr = N, nc = ncol(x), d = double(N * : "veg_distance" not available for .C() for package "vegan"

I believe I have both packages are updated to the latest version. Any tips? Thanks!

jarioksa commented 6 years ago

Your phyloseq is still based on vegan 2.4. You must either downgrade vegan to 2.4-6 or install phyloseq from the source file. It seems that Bioconductor binary of phyloseq is still today (Apr 23, 2018) based on vegan 2.4. You can see this by looking at Imports entry of Details in https://www.bioconductor.org/packages/release/bioc/html/phyloseq.html When writing this, it still had vegan (>= 2.4).

okayama1 commented 6 years ago

Thanks for all the discussions and comments about the issue. That has been working for me.

On Mon, Apr 23, 2018 at 10:43 PM, Jari Oksanen notifications@github.com wrote:

Your phyloseq is still based on vegan 2.4. You must either downgrade vegan to 2.4-6 or install phyloseq from the source file. It seems that Bioconductor binary of phyloseq is still today (Apr 23, 2018) based on vegan 2.4. You can see this by looking at Imports entry of Details in https://www.bioconductor.org/packages/release/bioc/html/phyloseq.html When writing this, it still had vegan (>= 2.4).

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/joey711/phyloseq/issues/918#issuecomment-383561531, or mute the thread https://github.com/notifications/unsubscribe-auth/AGbowf2NcFGRaVKy5Y62KGImJkDWRsTgks5trcxzgaJpZM4TV6Or .

jflater commented 6 years ago

This is what I did to get around this issue, I was having extreme difficulty with trying to re-vert my vegan installation and install from source for phyloseq. My phyloseq object is called inc.not.na.

First get otu_table and transpose it: dist.matrix <- t(data.frame(otu_table(inc.not.na)))

Then use vegdist from vegan to generate a bray distance object: bray.not.na <- vegdist(dist.matrix, method = "bray")

jarioksa commented 6 years ago

The solution has been given in many places, but let us repeat it here: you must either (1) install phyloseq from sources, or (2) downgrade vegan.

To install phyloseq from sources, you must give argument type = "source" either in biocLite() or install.packages() call (depending on what you use). Since phyloseq does not have any compiled code, you can do this in most systems without any special tools. You do not need to upgrade the version of phyloseq, but the release version works OK. You only need to re-install phyloseq after you have installed vegan 2.5-1.

To downgrade vegan, you can use devtools::install_version() function from the devtools package.

Issues #921 and vegandevs/vegan#272 are duplicates of this issue and have the same answer.

jasonzhao0307 commented 6 years ago

What jflater said is really helpful :)

raw937 commented 6 years ago

I am having a similar issue. It worked no problem like less than a month ago. library("phyloseq"); packageVersion("phyloseq") [1] ‘1.22.3’ 2.5.2 of vegan

Error in .C("veg_distance", x = as.double(x), nr = N, nc = ncol(x), d = double(N * : "veg_distance" not available for .C() for package "vegan"

df_all = as(sample_data(norm_all), "data.frame")

failer >>> d_rare = phyloseq::distance(norm_all, "bray")

mikemc commented 6 years ago

Did you recently upgrade your version of R or to Bioconductor & phyloseq? If so, you will need to reinstall vegan and possibly other packages (at least this is what I have to do on archlinux when R or phyloseq gets updated).

raw937 commented 6 years ago

Nope neither just stopped worked - wth

raw937 commented 6 years ago

I have tried to re-install vegan. No luck it works on one dataset but not another. WTH man!

raw937 commented 6 years ago

What is this error - Error in .C("veg_distance", x = as.double(x), nr = N, nc = ncol(x), d = double(N * : "veg_distance" not available for .C() for package "vegan" makes no sense.

mikemc commented 6 years ago

What operating system and type of R installation are you using? I remember having similar issues when my linux OS upgraded R to 3.4.4, and had to reinstall all of the R packages that I had previously installed that required compilation during the install, which I did with

library(tidyverse)

tb <- installed.packages() %>% as_tibble
avail <- available.packages() %>% as_tibble
tb0 <- semi_join(tb, avail, by = "Package")
tb1 <- tb0 %>% filter(!(NeedsCompilation %in% "no"), Built != "3.4.4")
install.packages(tb1$Package)

where "3.4.4" should be replaced with whatever your current R version is.

raw937 commented 6 years ago

R version 3.4.3 Ubuntu 16.04

spholmes commented 6 years ago

This version is not the "current" release and that creates problems with the Bioconductor packages in particular. My guess is that another BioC conductor package has a distance function, you might try:

phyloseq::distance instead of just distance first, otherwise you may have to update to the current version of R that was released in April.

On Fri, Jun 1, 2018 at 2:55 PM, Richard Allen White III < notifications@github.com> wrote:

R version 3.4.3 Ubuntu 16.04

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/joey711/phyloseq/issues/918#issuecomment-394019819, or mute the thread https://github.com/notifications/unsubscribe-auth/ABJcvZtpgTJcM3XmbwuF7aRynv_dlwbKks5t4bhRgaJpZM4TV6Or .

-- Susan Holmes John Henry Samter Fellow in Undergraduate Education Professor, Statistics 2017-2018 CASBS Fellow, Sequoia Hall, 390 Serra Mall Stanford, CA 94305 http://www-stat.stanford.edu/~susan/

raw937 commented 6 years ago

Error in (function (classes, fdef, mtable) : unable to find an inherited method for function ‘distance’ for signature ‘"phyloseq", "character"’ 3. stop(gettextf("unable to find an inherited method for function %s for signature %s", sQuote(fdef@generic), sQuote(cnames)), domain = NA) 2. (function (classes, fdef, mtable) { methods <- .findInheritedMethods(classes, fdef, mtable) if (length(methods) == 1L) ... 1. distance(filtered16s, "unifrac", weighted = F)

raw937 commented 6 years ago

Wow, that worked!!

don't use this: d = distance(filtered16s, "unifrac", weighted = F)

USE THIS: d = phyloseq::distance(filtered16s, "unifrac", weighted = F)

spholmes commented 6 years ago

That's not so polite, but we aim to please😃glad it worked for you.

On Fri, Jun 1, 2018, 15:04 Richard Allen White III notifications@github.com wrote:

Crap that worked.

don't use this: d = distance(filtered16s, "unifrac", weighted = F)

USE THIS: d = phyloseq::distance(filtered16s, "unifrac", weighted = F)

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/joey711/phyloseq/issues/918#issuecomment-394021770, or mute the thread https://github.com/notifications/unsubscribe-auth/ABJcvSz2NnU8Mctg6h72lcwIOlpu1zXZks5t4bpzgaJpZM4TV6Or .

anniewest commented 6 years ago

Hello, I tried re-installing phyloseq and downgrading vegan but neither worked. With trying to downgrade vegan, I hit a snag where I needed to install Rtools but there was no compatible version of this package for R 3.4.4

Not sure where to go from here??

spholmes commented 6 years ago

We have managed for all the other people, but you really need to delete vegan and reinstall phyloseq and the most recent r and not with rtools.

On Mon, Aug 6, 2018, 19:47 anniewest notifications@github.com wrote:

Hello, I tried re-installing phyloseq and downgrading vegan but neither worked. With trying to downgrade vegan, I hit a snag where I needed to install Rtools but there was no compatible version of this package for R 3.4.4

Not sure where to go from here??

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/joey711/phyloseq/issues/918#issuecomment-410915556, or mute the thread https://github.com/notifications/unsubscribe-auth/ABJcvTXRwRvUhK95qq9j0hYOvCtQsB4Cks5uOP-tgaJpZM4TV6Or .

anniewest commented 6 years ago

I have done this multiple times, including re-installing R but it still won't work for me. I managed to get the last version of vegan installed, but that didn't work either.

jarioksa commented 6 years ago

@anniewest : See also vegandevs/vegan#272 which explains the issue and gives the solution.

Briefly: phyloseq requires the same version of vegan that was used when the phyloseq package was built into a binary package. This gives two alternative ways of handling the issue:

  1. First install vegan and then install phyloseq from the source package. This will work with any version of vegan and phyloseq: you need the correct pairing, and this finds the match (and when one changes, you must change the other).
  2. First install binary package of phyloseq and then find and install the version of vegan that was hardcoded in that binary version of phyloseq. You can find the needed vegan version from the DESCRIPTION file of phyloseq, and you cannot use more modern vegan, but you must use exactly the one given in the phyloseq DESCRIPTION.
snowcastle1 commented 6 years ago

I'm pretty new at this, but how do I install phyloseq form the source package? Is it just source("https://bioconductor.org/biocLite.R") biocLite("phyloseq")?

When I install phyloseq this way it still does not work for me. I keep getting this error: Error in (function (classes, fdef, mtable) : unable to find an inherited method for function ‘distance’ for signature ‘"phyloseq", "missing"’.

I have also tried to install phyloseq this way: library("devtools") install_github("phyloseq/joey711") Downloading GitHub repo phyloseq/joey711@master from URL https://api.github.com/repos/phyloseq/joey711/zipball/master Installation failed: Not Found (404)

lkoest12 commented 5 years ago

Hey guys,

I know this is closed, but I thought I'd post about R version 3.6. I'm pretty novice with R, but this worked for me~

I just recently upgraded my R version fom 3.5.3 to 3.6 within R studios (they suggest not to do this in Rstudio, but it worked after a few prompt box decisions) using the code below. Also, I didn't keep any of my previously downloaded packages for a fresh install.

install.packages("installr")
library(installr)
updateR()

from there, I used the new install method.

if (!requireNamespace("BiocManager"))
install.packages("BiocManager")
BiocManager::install("phyloseq")
library(phyloseq)

Now I can get through most of my old script no problem. I upgraded mainly for generating CCA ordination, for which I was getting the above error. Now things work fine. lmk if anyone would like my ordination code!

sessioninfo() phyloseq - 1.28.0 vegan - 2.5-5