nextgenusfs / amptk

AMPtk: Amplicon ToolKit for NGS data (formally UFITS)
http://amptk.readthedocs.io/
BSD 2-Clause "Simplified" License
37 stars 14 forks source link

install question #42

Closed devonorourke closed 5 years ago

nextgenusfs commented 5 years ago

Should be python-edlib. But you shouldn’t need to specify just install with conda install amptk.

Jon

On Sep 26, 2018, at 6:48 AM, devonorourke notifications@github.com wrote:

Closed #42.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.

devonorourke commented 5 years ago

I think this was because I set up my virtual environment with python3, not python2. I'm getting a similar error with python2 now though... so ...?

On Wed, Sep 26, 2018 at 10:02 AM Jon Palmer notifications@github.com wrote:

Should be python-edlib. But you shouldn’t need to specify just install with conda install amptk.

Jon

On Sep 26, 2018, at 6:48 AM, devonorourke notifications@github.com wrote:

Closed #42.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/nextgenusfs/amptk/issues/42#issuecomment-424726999, or mute the thread https://github.com/notifications/unsubscribe-auth/AKqgXDQZkSGROyEg04LIL2vgZc2BR8f9ks5ue4kCgaJpZM4W6sUH .

-- Devon O'Rourke Graduate student in Molecular and Evolutionary Systems Biology University of New Hampshire

nextgenusfs commented 5 years ago

It should run on py2 or py3. But might be related to the compiler used. Try this: conda create -n amptk python=3.6 gcc amptk

The can activate env with conda activate amptk. Sometimes the compiler libraries are missing, in this case edlib maybe built with conda gcc which is maybe missing on your current env?

On Sep 26, 2018, at 7:07 AM, devonorourke notifications@github.com wrote:

I think this was because I set up my virtual environment with python3, not python2. I'm getting a similar error with python2 now though... so ...?

On Wed, Sep 26, 2018 at 10:02 AM Jon Palmer notifications@github.com wrote:

Should be python-edlib. But you shouldn’t need to specify just install with conda install amptk.

Jon

On Sep 26, 2018, at 6:48 AM, devonorourke notifications@github.com wrote:

Closed #42.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/nextgenusfs/amptk/issues/42#issuecomment-424726999, or mute the thread https://github.com/notifications/unsubscribe-auth/AKqgXDQZkSGROyEg04LIL2vgZc2BR8f9ks5ue4kCgaJpZM4W6sUH .

-- Devon O'Rourke Graduate student in Molecular and Evolutionary Systems Biology University of New Hampshire — You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

devonorourke commented 5 years ago

Okay, so following that install:

conda create -n amptk python=3.6 gcc amptk

I get a host of compiler related clobber issues once the installation is almost done...

SafetyError: The package for amptk located at /mnt/lustre/macmaneslab/devon/.conda/pkgs/amptk-1.2.4-py36r3.4.1_0
appears to be corrupted. The path 'opt/amptk-1.2.4/lib/amptklib.py'
has a sha256 mismatch.
  reported sha256: d77dcaf93d7fe79599d734f74c05424591ab9f1be2dc5e023cb0b929346c3875
  actual sha256: 831fe2ea3ec4dabbf0740ffa4c97008f3f63d52df41efce2ae44945a415930b5

ClobberError: This transaction has incompatible packages due to a shared path.
  packages: conda-forge::libgcc-ng-7.2.0-hdf63c60_3, defaults::gcc-4.8.5-7
  path: 'lib/libasan.so'

ClobberError: This transaction has incompatible packages due to a shared path.
  packages: conda-forge::libgcc-ng-7.2.0-hdf63c60_3, defaults::gcc-4.8.5-7
  path: 'lib/libatomic.so'

ClobberError: This transaction has incompatible packages due to a shared path.
  packages: conda-forge::libgcc-ng-7.2.0-hdf63c60_3, defaults::gcc-4.8.5-7
  path: 'lib/libatomic.so.1'

ClobberError: This transaction has incompatible packages due to a shared path.
  packages: conda-forge::libgcc-ng-7.2.0-hdf63c60_3, defaults::gcc-4.8.5-7
  path: 'lib/libgcc_s.so'

ClobberError: This transaction has incompatible packages due to a shared path.
  packages: conda-forge::libgcc-ng-7.2.0-hdf63c60_3, defaults::gcc-4.8.5-7
  path: 'lib/libgcc_s.so.1'

ClobberError: This transaction has incompatible packages due to a shared path.
  packages: conda-forge::libgcc-ng-7.2.0-hdf63c60_3, defaults::gcc-4.8.5-7
  path: 'lib/libgomp.so'

ClobberError: This transaction has incompatible packages due to a shared path.
  packages: conda-forge::libgcc-ng-7.2.0-hdf63c60_3, defaults::gcc-4.8.5-7
  path: 'lib/libgomp.so.1'

ClobberError: This transaction has incompatible packages due to a shared path.
  packages: conda-forge::libgcc-ng-7.2.0-hdf63c60_3, defaults::gcc-4.8.5-7
  path: 'lib/libgomp.so.1.0.0'

ClobberError: This transaction has incompatible packages due to a shared path.
  packages: conda-forge::libgcc-ng-7.2.0-hdf63c60_3, defaults::gcc-4.8.5-7
  path: 'lib/libitm.so'

ClobberError: This transaction has incompatible packages due to a shared path.
  packages: conda-forge::libgcc-ng-7.2.0-hdf63c60_3, defaults::gcc-4.8.5-7
  path: 'lib/libitm.so.1'

ClobberError: This transaction has incompatible packages due to a shared path.
  packages: conda-forge::libgcc-ng-7.2.0-hdf63c60_3, defaults::gcc-4.8.5-7
  path: 'lib/libitm.so.1.0.0'

ClobberError: This transaction has incompatible packages due to a shared path.
  packages: conda-forge::libgcc-ng-7.2.0-hdf63c60_3, defaults::gcc-4.8.5-7
  path: 'lib/libquadmath.so'

ClobberError: This transaction has incompatible packages due to a shared path.
  packages: conda-forge::libgcc-ng-7.2.0-hdf63c60_3, defaults::gcc-4.8.5-7
  path: 'lib/libquadmath.so.0'

ClobberError: This transaction has incompatible packages due to a shared path.
  packages: conda-forge::libgcc-ng-7.2.0-hdf63c60_3, defaults::gcc-4.8.5-7
  path: 'lib/libquadmath.so.0.0.0'

ClobberError: This transaction has incompatible packages due to a shared path.
  packages: conda-forge::libgcc-ng-7.2.0-hdf63c60_3, defaults::gcc-4.8.5-7
  path: 'lib/libtsan.so'

ClobberError: This transaction has incompatible packages due to a shared path.
  packages: conda-forge::libgcc-ng-7.2.0-hdf63c60_3, defaults::gcc-4.8.5-7
  path: 'lib/libtsan.so.0'

ClobberError: This transaction has incompatible packages due to a shared path.
  packages: conda-forge::libgcc-ng-7.2.0-hdf63c60_3, defaults::gcc-4.8.5-7
  path: 'lib/libtsan.so.0.0.0'

ClobberError: This transaction has incompatible packages due to a shared path.
  packages: conda-forge::libgcc-ng-7.2.0-hdf63c60_3, defaults::gcc-4.8.5-7
  path: 'share/info/libgomp.info'

ClobberError: This transaction has incompatible packages due to a shared path.
  packages: conda-forge::libgcc-ng-7.2.0-hdf63c60_3, defaults::gcc-4.8.5-7
  path: 'share/info/libquadmath.info'

ClobberError: This transaction has incompatible packages due to a shared path.
  packages: conda-forge::libgfortran-3.0.0-1, conda-forge::libgcc-7.2.0-h69d50b8_2, defaults::gcc-4.8.5-7
  path: 'lib/libgfortran.so.3'

ClobberError: This transaction has incompatible packages due to a shared path.
  packages: conda-forge::libgfortran-3.0.0-1, conda-forge::libgcc-7.2.0-h69d50b8_2, defaults::gcc-4.8.5-7
  path: 'lib/libgfortran.so.3.0.0'

ClobberError: This transaction has incompatible packages due to a shared path.
  packages: conda-forge::libgfortran-ng-7.2.0-hdf63c60_3, defaults::gcc-4.8.5-7
  path: 'lib/libgfortran.so'

ClobberError: This transaction has incompatible packages due to a shared path.
  packages: conda-forge::libstdcxx-ng-7.2.0-hdf63c60_3, defaults::gcc-4.8.5-7
  path: 'lib/libstdc++.so'

ClobberError: This transaction has incompatible packages due to a shared path.
  packages: conda-forge::libstdcxx-ng-7.2.0-hdf63c60_3, defaults::gcc-4.8.5-7
  path: 'lib/libstdc++.so.6'

And then if I run amptk --version:

(amptk) [devon@premise ~]$ amptk --version
/mnt/lustre/macmaneslab/devon/.conda/envs/amptk/lib/python3.6/importlib/_bootstrap.py:219: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  return f(*args, **kwds)
/mnt/lustre/macmaneslab/devon/.conda/envs/amptk/lib/python3.6/importlib/_bootstrap.py:219: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  return f(*args, **kwds)
Traceback (most recent call last):
  File "/mnt/lustre/macmaneslab/devon/.conda/envs/amptk/bin/amptk", line 2, in <module>
    import amptk
  File "/mnt/lustre/macmaneslab/devon/.conda/envs/amptk/opt/amptk-1.2.4/bin/amptk.py", line 15, in <module>
    import lib.amptklib as amptklib
  File "/mnt/lustre/macmaneslab/devon/.conda/envs/amptk/opt/amptk-1.2.4/lib/amptklib.py", line 14
    import edlib-aligner as edlib
devonorourke commented 5 years ago

I think there was some weird lingering edit that remained in the $HOME//.conda/envs/amptk/opt/amptk-1.2.4/lib/amptklib.py file where I had substituted:

import edlib-aligner as edlib

I noticed also that edlib wasn't installed with:

conda create -n amptk python=3.6 gcc amptk

So I did:

conda install edlib

And because of that weird numpy error:

conda update numpy

And now I don't have any issue with running amptk --version

nextgenusfs commented 5 years ago

So does that mean that python-edlib is not installing the edlib backend?

On Sep 26, 2018, at 7:54 AM, devonorourke notifications@github.com wrote:

I think there was some weird lingering edit that remained in the $HOME//.conda/envs/amptk/opt/amptk-1.2.4/lib/amptklib.py file where I had substituted:

import edlib-aligner as edlib I noticed also that edlib wasn't installed with:

conda create -n amptk python=3.6 gcc amptk So I did:

conda install edlib And because of that weird numpy error:

conda update numpy And now I don't have any issue with running amptk --version

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

devonorourke commented 5 years ago

I don't know. I'm just looking at what was installed in the virtual environment bin, and there wasn't any edlib-aligner program initially until I manually installed it.

I don't see any python-edlib at the moment, but could also manually install

devonorourke commented 5 years ago

For what it's worth, amptk illumina appears to be running now, once I've made those modifications (installing edlib and updating numpy)

nextgenusfs commented 5 years ago

Right, the python-edlib installs the python bindings for edlib, ie allowing you to “import edlib” into the script. Conda install edlib will only install the C code backend. When I get some free time I need to cut a new release as I made some enhancements to amptk stats awhile ago and might be a small bug in taxonomy. After I get a new release I will experiment with the bioconda packaging.

On Sep 26, 2018, at 8:13 AM, devonorourke notifications@github.com wrote:

For what it's worth, amptk illumina appears to be running now, once I've made those modifications (installing edlib and updating numpy)

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

devonorourke commented 5 years ago

Managed to get amptk illumina to work but now getting stuck on the Dada2 clustering step. Recall that my virtual environment was created with:

conda create -n amptk python=3.6 gcc amptk

The error in the Rscript.log file is a cluster of things, 1) It can't seem to detect a mirror, and 2) it can't figure out how to load the libraries. A little snippet of that error at the beginning:

Warning: failed to download mirrors file (cannot open URL 'https://cran.r-project.org/CRAN_mirrors.csv'); using local file '/mnt/lustre/macmaneslab/devon/.conda/envs/amptk/lib/R/doc/CRAN_mirrors.csv'
Warning message:
In download.file(url, destfile = f, quiet = TRUE) :
  URL 'https://cran.r-project.org/CRAN_mirrors.csv': status was 'Couldn't connect to server'
Loading required package: ShortRead
Loading required package: BiocGenerics
Loading required package: methods
Loading required package: parallel
...
...
Loading required package: BiocParallel
Error: package or namespace load failed for ‘BiocParallel’ in dyn.load(file, DLLpath = DLLpath, ...):
 unable to load shared object '/mnt/lustre/macmaneslab/devon/R/x86_64-pc-linux-gnu-library/3.4/BiocParallel/libs/BiocParallel.so':
  /mnt/lustre/macmaneslab/devon/.conda/envs/amptk/lib/R/bin/exec/../../lib/../.././libstdc++.so.6: version `GLIBCXX_3.4.21' not found (required by /mnt/lustre/macmaneslab/devon/R/x86_64-pc-linux-gnu-library/3.4/BiocParallel/libs/BiocParallel.so)
Failed with error:  ‘package ‘BiocParallel’ could not be loaded’

I figured if the program can't download them, then it can't install them. So I manually went into the virtual environment and tried to do that:

source("https://bioconductor.org/biocLite.R")
biocLite("BiocGenerics")

I also tried things from CRAN:

install.packages("Rcpp", repos='http://cran.us.r-project.org')

The error that I receive in each of these cases is the library loading message:

Error: package or namespace load failed for ‘Rcpp’ in dyn.load(file, DLLpath = DLLpath, ...):
 unable to load shared object '/mnt/lustre/macmaneslab/devon/R/x86_64-pc-linux-gnu-library/3.4/Rcpp/libs/Rcpp.so':
  /mnt/lustre/macmaneslab/devon/.conda/envs/amptk/lib/R/bin/exec/../../lib/../.././libstdc++.so.6: version `GLIBCXX_3.4.20' not found (required by /mnt/lustre/macmaneslab/devon/R/x86_64-pc-linux-gnu-library/3.4/Rcpp/libs/Rcpp.so)
Error: package ‘Rcpp’ could not be loaded

I tried:

conda install libgcc

... but that didn't work any better

All I can find in the forums is something along the lines of sudo apt-get install libstdc++6, but I don't have sudo privileges on my compute cluster.

Likewise, I thought maybe updating Conda from the current v4.5.4 to the newest v4.5.11 might help, but I again ran into sudo problems.

Any thoughts? Maybe I should ditch DADA2 and just use UNoise? What's that cryptic line in your docs about Likewise, the output data from UNOISE3 is the same as DADA2 and UNOISE2, although “it works better”….? What clustering algorithm to you trust?

devonorourke commented 5 years ago

One other thing: I also tried specifying the Conda-specific .libPath during the install, and it didn't work either:

install.packages("tidyverse", repos='http://cran.us.r-project.org', lib='/mnt/lustre/macmaneslab/devon/.conda/envs/amptk/lib/R/library')

Gives the same error

Error in dyn.load(file, DLLpath = DLLpath, ...) : 
  unable to load shared object '/mnt/lustre/macmaneslab/devon/R/x86_64-pc-linux-gnu-library/3.4/bindrcpp/libs/bindrcpp.so':
  /lib64/libstdc++.so.6: version `GLIBCXX_3.4.21' not found (required by /mnt/lustre/macmaneslab/devon/R/x86_64-pc-linux-gnu-library/3.4/bindrcpp/libs/bindrcpp.so)
ERROR: lazy loading failed for package ‘dbplyr’
nextgenusfs commented 5 years ago

So sounds like the R packages aren’t installed correctly in the conda package. There is an R channel for conda, but if you go to anaconda cloud you can search for all available packages. You need to find the conda packages for those missing R dependencies. Usually they look like r-package or for dada2 they are listed as bioconductor-dada2. Note - I’ve had problems with the R packages on conda as well.... but have let tried in awhile. But these errors seem to be related to the compiler libraries that are missing - ie the package was compiled with a librarynot in your system. Conda theoretically should be managing this, but they switched compilers a few months back that perhaps is still causing some issues.

devonorourke commented 5 years ago

Right. That's what I was manually trying to do. The missing R packages are both within CRAN and within Bioconductor. My issue is that I can't install them within R itself. If I'm hearing you right, it's to try to install those packages through anaconda, rather than manually doing it with R? I'll give that a shot

devonorourke commented 5 years ago

I'm still curious though - could I just use UNoise instead? What's the upside/downside of DADA2 vs. UNoise?

nextgenusfs commented 5 years ago

Sure. In practice dada2 and unoise3 are very similar. Dada2 is a little bit more aggressive in chimera filtering the last time I compared, but will depend on the versions of each software. Unoise3 is much much faster than dada2. Both ESV pipelines in amptk run an additional 97% clustering step, so user can use either data downstream. Whether you should use ESV or exact sequence variants versus clustering (uparse) depends on the amplicon - there are many where I don’t think the ESVs make biological sense.

devonorourke commented 5 years ago

I think the bulk of these issues were because the compute nodes on my cluster don't have internet access, so when the R script wants to download/upgrade packages, the program dies. It was only after going back into R and manually downloading ggplot2 that an error message became obvious: ggplot2 install required another R library, and only after that program was installed could ggplot2 be installed, and only after that could dada2 run.

But after long last, after that manual install, it's working just fine. Weird.

nextgenusfs commented 5 years ago

Okay well good to know. I added those automatic download scripts as a convenience option to download and install missing packages, but sounds like I should just delete that so errors are more simple: ie ggplot2 not installled.....

nextgenusfs commented 5 years ago

The "auto-install" in the R scripts was removed. Re-open if this is still an issue.