sneumann / xcms

This is the git repository matching the Bioconductor package xcms: LC/MS and GC/MS Data Analysis
Other
188 stars 80 forks source link

xcms on conda #775

Open cbroeckl opened 1 month ago

cbroeckl commented 1 month ago

i am trying to setup on an HPC system using conda. I have tried R 4.4.1, as well as R 4.3.1, and in each case i cannot seem to get xcms to load for various reasons - usually due to some compatibility issue or another. i was wondering if there is a guide out there in the world to get one started down this path?

I am running conda 24.7.1 via miniconda3.

https://anaconda.org/bioconda/bioconductor-xcms - this site has multiple commands available for installation, which makes me think i am not the only one struggling with this.

BELOW IS CODE to install xcms, and the progress and output:

(r441) bash-4.4$ conda install bioconda::bioconductor-xcms Collecting package metadata (current_repodata.json): done Solving environment: unsuccessful initial attempt using frozen solve. Retrying with flexible solve. Solving environment: unsuccessful attempt using repodata from current_repodata.json, retrying with next repodata source. Collecting package metadata (repodata.json): done Solving environment: unsuccessful initial attempt using frozen solve. Retrying with flexible solve. Solving environment: - Found conflicts! Looking for incompatible packages. This can take several minutes. Press CTRL-C to abort. failed

UnsatisfiableError: The following specifications were found to be incompatible with each other:

Output in format: Requested package -> Available versions

Package libgcc-ng conflicts for: python=3.12 -> libgcc-ng[version='>=11.2.0'] python=3.12 -> bzip2[version='>=1.0.8,<2.0a0'] -> libgcc-ng[version='>=7.3.0|>=7.5.0']

Package zlib conflicts for: python=3.12 -> zlib[version='>=1.2.13,<1.3.0a0'] python=3.12 -> sqlite[version='>=3.45.3,<4.0a0'] -> zlib[version='>=1.2.12,<1.3.0a0|>=1.2.13,<2.0a0']

Package xz conflicts for: python=3.12 -> xz[version='>=5.4.2,<6.0a0|>=5.4.5,<6.0a0|>=5.4.6,<6.0a0'] bioconda::bioconductor-xcms -> r-base[version='>=4.3,<4.4.0a0'] -> xz[version='5.2.*|>=5.2.4,<6.0a0|>=5.2.5,<6.0a0|>=5.4.2,<6.0a0']

Package ncurses conflicts for: python=3.12 -> ncurses[version='>=6.4,<7.0a0'] python=3.12 -> readline[version='>=8.1.2,<9.0a0'] -> ncurses[version='>=6.1,<7.0a0|>=6.2,<7.0a0|>=6.3,<7.0a0']The following specifications were found to be incompatible with your system:

Your installed version is: 2.28

R version 4.3.1 (2023-06-16) -- "Beagle Scouts" Copyright (C) 2023 The R Foundation for Statistical Computing Platform: x86_64-conda-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY. You are welcome to redistribute it under certain conditions. Type 'license()' or 'licence()' for distribution details.

Natural language support but running in an English locale

R is a collaborative project with many contributors. Type 'contributors()' for more information and 'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or 'help.start()' for an HTML browser interface to help. Type 'q()' to quit R.

cbroeckl commented 1 month ago

installing the bioconda-xcms:

conda create -n xcms bioconductor-xcms

This works, but this installs a version of R from before COVID!
R version 3.6.1 (2019-07-05) -- "Action of the Toes"

And i can't actually load XCMS:

library(xcms)

Loading required package: mzR Loading required package: Rcpp Error: package or namespace load failed for ‘mzR’ in dyn.load(file, DLLpath = DLLpath, ...): unable to load shared object '/nfs/home/cbroec/miniconda3/envs/xcms/lib/R/library/ncdf4/libs/ncdf4.so': libnetcdf.so.13: cannot open shared object file: No such file or directory Error: package ‘mzR’ could not be loaded

library(mzR)

Error: package or namespace load failed for ‘mzR’ in dyn.load(file, DLLpath = DLLpath, ...): unable to load shared object '/nfs/home/cbroec/miniconda3/envs/xcms/lib/R/library/ncdf4/libs/ncdf4.so': libnetcdf.so.13: cannot open shared object file: No such file or directory

library(ncdf4)

sneumann commented 1 month ago

Hi, thanks for reporting. Can you link to the conda recipe that needs fixing ? And open an issue there as well, and cross-link the two ? I am not familiar with the update policy of BioC packages in Conda, so we might need external help. Yours, Steffen

cbroeckl commented 1 month ago

I will do my best to report appropriately: but here is the link to the recipe:

https://github.com/bioconda/bioconda-recipes/tree/master/recipes/bioconductor-xcms

I didn't look closely at this prior to your reply, but it appears the recipe is actually calling for R 4.0.0.

{% set version = "4.0.0" %}
{% set name = "xcms" %}
{% set bioc = "3.18" %}
sneumann commented 1 month ago

Not asking for R-4.0.0, but instead it wants to install xcms-4.0.0 from BioC-3.18: https://bioconductor.org/packages/3.18/bioc/html/xcms.html So indeed would be due for some refresh. I vaguely have in mind that bioconcda and bioconductor have some procedure for updating, but I don't know details. Best place would be Bioconda chat channels. I pinged Matthias Bernt in another bioconda issue. Yours, Steffen

bernt-matthias commented 1 month ago

Try to create a clean conda env

conda create --strict-channel-priority --override-channels --channel conda-forge --channel bioconda --name xcms bioconductor-xcms

This installs a R 4.x for me and library(xcms) works.

cbroeckl commented 1 month ago

Try to create a clean conda env

conda create --strict-channel-priority --override-channels --channel conda-forge --channel bioconda --name xcms bioconductor-xcms

This installs a R 4.x for me and library(xcms) works.

That was magical. it worked the first try. So it was pulling an older bioconda-xcms from one of the other channels? Thanks @bernt-matthias.

bernt-matthias commented 1 month ago

You can check which channels have been used with conda list in the other env.

Just in case, if you are working for an institution with more then 200 people you might want to avoid the default channel and switch to miniforge. See https://www.fz-juelich.de/en/rse/the_latest/the-anaconda-is-squeezing-us

JohanLassen commented 1 month ago

Hi, I implemented a xcms-based HPC worklfow that does parallel peak picking and parallel integration of missing values. The latter is to optimize memory usage at the heavy price of saving everything to the drive. Been tested up to 20k untargeted LCMS samples. Check it out: https://johanlassen.github.io/cXCMS/articles/HPC_workflow.html

cbroeckl commented 1 month ago

thanks @JohanLassen - i will certainly take a look at this and learn from it!

jorainer commented 1 month ago

just a quick heads up for current development we're doing: we're also working on a xcms result object that stores all data in a HDF5 file - this will reduce the amount of required memory (at the cost of more I/O) and is tailored for very large scale experiments.