nf-core / ampliseq

Amplicon sequencing analysis workflow using DADA2 and QIIME2
https://nf-co.re/ampliseq
MIT License
187 stars 117 forks source link

Running pipeline with conda profile fails with "Error in library(digest) : there is no package called 'digest'" #512

Closed lbiernot closed 1 year ago

lbiernot commented 1 year ago

Description of the bug

The pipeline fails on DADA2_MERGE step. It's irrelevant which input files I provide. The pipeline used to work literally until yestarday evening. We are always running it in a clean environment where no nextflow pipeline was run before. It implies that nextflow downloads nf-core/ampliseq every time.

It seems that R-package digest is missing in DADA2 environment.

Command used and terminal output

nextflow run nf-core/ampliseq -r 2.4.1 -profile conda --input /tmp/input_files --FW_primer AGAGTTTGATCCTGGCTCAG --RV_primer ATTACCGCGGCTGCTGG --sample_inference pseudo --outdir /tmp/results --trunc_qmin 25 --trunc_rmin 0 --max_ee 4
--min-frequency 1 --picrust --skip_qiime --max_memory 100.GB --max_cpus 12

Staging foreign file: https://zenodo.org/record/4587955/files/silva_nr99_v138.1_wSpecies_train_set.fa.gz
Staging foreign file: https://zenodo.org/record/4587955/files/silva_species_assignment_v138.1.fa.gz
[53/3360e0] Submitted process > NFCORE_AMPLISEQ:AMPLISEQ:RENAME_RAW_DATA_FILES (OBFUSCATED1)
[55/34d94a] Submitted process > NFCORE_AMPLISEQ:AMPLISEQ:RENAME_RAW_DATA_FILES (OBFUSCATED2)
[29/07db0d] Submitted process > NFCORE_AMPLISEQ:AMPLISEQ:FORMAT_TAXONOMY
Creating env using conda: bioconda::fastqc=0.11.9 [cache /work/conda/env-e92705f017a3fab1fca3da28471361dd]
Creating env using conda: bioconda::cutadapt=3.4 [cache /work/conda/env-e4e69d60f36f70dad64c8ac007ee2906]
[50/c4d90c] Submitted process > NFCORE_AMPLISEQ:AMPLISEQ:FASTQC (OBFUSCATED1)
[b2/79c624] Submitted process > NFCORE_AMPLISEQ:AMPLISEQ:FASTQC (OBFUSCATED2)
[10/89f775] Submitted process > NFCORE_AMPLISEQ:AMPLISEQ:CUTADAPT_WORKFLOW:CUTADAPT_BASIC (OBFUSCATED2)
[f2/d3da16] Submitted process > NFCORE_AMPLISEQ:AMPLISEQ:CUTADAPT_WORKFLOW:CUTADAPT_BASIC (OBFUSCATED1)
Creating env using conda: bioconductor-dada2=1.22.0 [cache /work/conda/env-a2130a2a40ea37f6112e25fb23580576]
Creating env using conda: conda-forge::python=3.8.3 [cache /work/conda/env-9f7c67c7ed50ac9b5d28463bb0039738]
[23/743d93] Submitted process > NFCORE_AMPLISEQ:AMPLISEQ:CUTADAPT_WORKFLOW:CUTADAPT_SUMMARY (cutadapt_standard)
[04/6709f7] Submitted process > NFCORE_AMPLISEQ:AMPLISEQ:DADA2_PREPROCESSING:DADA2_QUALITY (FW)
[0d/1563e4] Submitted process > NFCORE_AMPLISEQ:AMPLISEQ:CUTADAPT_WORKFLOW:CUTADAPT_SUMMARY_MERGE (cutadapt_standard_summary.tsv)
[b3/bfe012] Submitted process > NFCORE_AMPLISEQ:AMPLISEQ:DADA2_PREPROCESSING:DADA2_QUALITY (RV)
Creating env using conda: pandas=1.1.5 [cache /work/conda/env-cd9222461d3bebb23c73e743c6238a09]
[32/804ca4] Submitted process > NFCORE_AMPLISEQ:AMPLISEQ:DADA2_PREPROCESSING:TRUNCLEN (RV)
[a1/67cd42] Submitted process > NFCORE_AMPLISEQ:AMPLISEQ:DADA2_PREPROCESSING:TRUNCLEN (FW)
WARN: Probably everything is fine, but this is a reminder that `--trunclenf` was set automatically to 254 and `--trunclenr` to 248. If this doesnt seem reasonable, then please change `--trunc_qmin` (and `--trunc_rmin`), or set `--trunclenf` and `--trunclenr` directly.
[53/0d2d80] Submitted process > NFCORE_AMPLISEQ:AMPLISEQ:DADA2_PREPROCESSING:DADA2_FILTNTRIM (OBFUSCATED2)
[02/34d7ca] Submitted process > NFCORE_AMPLISEQ:AMPLISEQ:DADA2_PREPROCESSING:DADA2_FILTNTRIM (OBFUSCATED1)
[29/603a1f] Submitted process > NFCORE_AMPLISEQ:AMPLISEQ:DADA2_PREPROCESSING:DADA2_QUALITY2 (RV)
[26/c90b5c] Submitted process > NFCORE_AMPLISEQ:AMPLISEQ:DADA2_ERR (1)
[31/6d878a] Submitted process > NFCORE_AMPLISEQ:AMPLISEQ:DADA2_PREPROCESSING:DADA2_QUALITY2 (FW)
[5d/58a14f] Submitted process > NFCORE_AMPLISEQ:AMPLISEQ:DADA2_DENOISING (1)
[05/322d62] Submitted process > NFCORE_AMPLISEQ:AMPLISEQ:DADA2_RMCHIMERA (1)
[8f/ab8e65] Submitted process > NFCORE_AMPLISEQ:AMPLISEQ:DADA2_STATS (1)
[3f/698c79] Submitted process > NFCORE_AMPLISEQ:AMPLISEQ:DADA2_MERGE
Error executing process > 'NFCORE_AMPLISEQ:AMPLISEQ:DADA2_MERGE'

Caused by:
  Process `NFCORE_AMPLISEQ:AMPLISEQ:DADA2_MERGE` terminated with an error exit status (1)

Command executed:

  #!/usr/bin/env Rscript
      suppressPackageStartupMessages(library(dada2))
      suppressPackageStartupMessages(library(digest))

      #combine stats files
      for (data in sort(list.files(".", pattern = ".stats.tsv", full.names = TRUE))) {
          if (!exists("stats")){ stats <- read.csv(data, header=TRUE, sep="\t") }
          if (exists("stats")){
              temp <-read.csv(data, header=TRUE, sep="\t")
              stats <-unique(rbind(stats, temp))
              rm(temp)
          }
      }
      write.table( stats, file = "DADA2_stats.tsv", sep = "\t", row.names = FALSE, col.names = TRUE, quote = FALSE, na = '')

      #combine dada-class objects
      files <- sort(list.files(".", pattern = ".ASVtable.rds", full.names = TRUE))
      if ( length(files) == 1 ) {
          ASVtab = readRDS(files[1])
      } else {
          ASVtab <- mergeSequenceTables(tables=files, repeats = "error", orderBy = "abundance", tryRC = FALSE)
      }
      saveRDS(ASVtab, "DADA2_table.rds")

      df <- t(ASVtab)
      colnames(df) <- gsub('_1.filt.fastq.gz', '', colnames(df))
      colnames(df) <- gsub('.filt.fastq.gz', '', colnames(df))
      df <- data.frame(sequence = rownames(df), df, check.names=FALSE)
      # Create an md5 sum of the sequences as ASV_ID and rearrange columns
      df$ASV_ID <- sapply(df$sequence, digest, algo='md5', serialize = FALSE)
      df <- df[,c(ncol(df),3:ncol(df)-1,1)]

      # file to publish
      write.table(df, file = "DADA2_table.tsv", sep = "\t", row.names = FALSE, quote = FALSE, na = '')

      # Write fasta file with ASV sequences to file
      write.table(data.frame(s = sprintf(">%s
  %s", df$ASV_ID, df$sequence)), 'ASV_seqs.fasta', col.names = FALSE, row.names = FALSE, quote = FALSE, na = '')

      # Write ASV file with ASV abundances to file
      df$sequence <- NULL
      write.table(df, file = "ASV_table.tsv", sep="\t", row.names = FALSE, quote = FALSE, na = '')

      writeLines(c("\"NFCORE_AMPLISEQ:AMPLISEQ:DADA2_MERGE\":", paste0("    R: ", paste0(R.Version()[c("major","minor")], collapse = ".")),paste0("    dada2: ", packageVersion("dada2")) ), "versions.yml")

Command exit status:
  1

Command output:
  (empty)

Command error:
  Error in library(digest) : there is no package called 'digest'
  Calls: suppressPackageStartupMessages -> withCallingHandlers -> library
  Execution halted

Work dir:
  /work/3f/698c79dbd72b153e34911f2fcc5bb3

Tip: you can try to figure out what's wrong by changing to the process work dir and showing the script file named `.command.sh`

Execution cancelled -- Finishing pending tasks before exit

Relevant files

No response

System information

nextflow=21.10.6 Hardware: desktop Executor: local Container engine: we've tested it both in docker container and on non-virtualized OS. Profile used to run ampliseq pipeline on nextflow is always conda. OS: Debian Buster nf-core/ampliseq: 2.4.0 and 2.4.1 same issue

d4straub commented 1 year ago

Hi there, I assume the conda package lacks r-digest, while it is available in the container. Might have slipped all tests. Could you please test appending to your nextflow command -c env.config -resume where env.config contains:

process {
    withName: DADA2_MERGE {
        conda 'bioconda::bioconductor-dada2=1.22.0 conda-forge::r-digest=0.6.30'
    }
}

disclaimer: I did not test this and I use singularity, so I'm not sure that works, but I hope so.

edit: according to https://nf-co.re/ampliseq/2.4.1/usage#updating-containers it seems rather conda =

lbiernot commented 1 year ago

@d4straub thank you. What I actually did is that I've created conda envs with proper dependencies beforehand and pointed ampliseq nextflow processes to use them but you've shown me the right direction.

d4straub commented 1 year ago

Glad it worked! I aim to make that work out of the box, so could you help me out and let me know what those proper dependencies were? Was it as above, any other components?

lbiernot commented 1 year ago

@d4straub for the moment adding r-digest=0.6.30 to DADA2_MERGE seemed enough but I'm actually precreating conda environments:

channels:
- conda-forge
- bioconda
dependencies:
- bioconda::barrnap 0.9
name: barrnap

channels:
- conda-forge
- bioconda
dependencies:
- bioconductor-dada2 1.22.0
- r-digest 0.6.30
name: bioconductor-dada2

channels:
- conda-forge
- bioconda
dependencies:
- bioconductor-biostrings 2.58.0
name: biostrings

channels:
- conda-forge
- bioconda
dependencies:
- bioconda::cutadapt 3.4
name: cutadapt

channels:
- conda-forge
- bioconda
dependencies:
- bioconda::fastqc 0.11.9
name: fastqc

channels:
- conda-forge
- bioconda
dependencies:
- bioconda::itsx 1.1.3
name: itsx

channels:
- conda-forge
- bioconda
dependencies:
- bioconda::multiqc 1.13
name: multiqc

channels:
- conda-forge
- bioconda
dependencies:
- pandas 1.1.5
name: pandas

channels:
- conda-forge
- bioconda
dependencies:
- bioconda::picrust2 2.5.0
name: picrust2

channels:
- conda-forge
- bioconda
dependencies:
- conda-forge::python 3.8.3
name: python

channels:
- conda-forge
- bioconda
dependencies:
- bioconda::r-tidyverse 1.2.1
name: r-tidyverse

channels:
- conda-forge
- bioconda
dependencies:
- conda-forge::sed 4.7
name: sed

channels:
- conda-forge
- bioconda
dependencies:
- bioconda::vsearch 2.21.1
name: vsearch

I'm planning to update these environment yaml files to include all packages as resolved and installed by conda but I didn't have time to do it yet.

d4straub commented 1 year ago

Thanks!

That conda managing seems a considerable amount of time invested there, that was supposed to be solved by the pipeline. Instead I would recommend testing out containers such as with singularity or docker that provide fixed environments, see https://nf-co.re/docs/usage/installation#pipeline-software

d4straub commented 1 year ago

Hi again, would you have time to test with

nextflow pull nf-core/ampliseq
nextflow pull nf-core/ampliseq -r dev
nextflow run nf-core/ampliseq -r dev -profile conda <your params>

whether that is resolved?

d4straub commented 1 year ago

I think I fixed all issues with conda in dev, will be in the next realese. Let me know if you come across any other issues (I hope not and it works smooth for you)!