nf-core / ampliseq

Amplicon sequencing analysis workflow using DADA2 and QIIME2
https://nf-co.re/ampliseq
MIT License
188 stars 118 forks source link

ERROR ~ Error executing process > 'NFCORE_AMPLISEQ:AMPLISEQ:DADA2_MERGE .........has been broken #727

Closed JayalalKJ closed 6 months ago

JayalalKJ commented 7 months ago

Description of the bug

Error indicates that the files ./A.ASVtable.rds and ./B.ASVtable.rds are found to be invalid because they are not recognized as matrices.

Pipeline completed with errors- ERROR ~ Error executing process > 'NFCORE_AMPLISEQ:AMPLISEQ:DADA2_MERGE'

Caused by: Process NFCORE_AMPLISEQ:AMPLISEQ:DADA2_MERGE terminated with an error exit status (1)

Command executed:

!/usr/bin/env Rscript

  suppressPackageStartupMessages(library(dada2))
  suppressPackageStartupMessages(library(digest))

  #combine stats files
  for (data in sort(list.files(".", pattern = ".stats.tsv", full.names = TRUE))) {
      if (!exists("stats")){ stats <- read.csv(data, header=TRUE, sep="\t") }
      if (exists("stats")){
          temp <-read.csv(data, header=TRUE, sep="\t")
          stats <-unique(rbind(stats, temp))
          rm(temp)
      }
  }
  write.table( stats, file = "DADA2_stats.tsv", sep = "\t", row.names = FALSE, col.names = TRUE, quote = FALSE, na = '')

  #combine dada-class objects
  files <- sort(list.files(".", pattern = ".ASVtable.rds", full.names = TRUE))
  if ( length(files) == 1 ) {
      ASVtab = readRDS(files[1])
  } else {
      ASVtab <- mergeSequenceTables(tables=files, repeats = "error", orderBy = "abundance", tryRC = FALSE)
  }
  saveRDS(ASVtab, "DADA2_table.rds")

  df <- t(ASVtab)
  colnames(df) <- gsub('_1.filt.fastq.gz', '', colnames(df))
  colnames(df) <- gsub('.filt.fastq.gz', '', colnames(df))
  df <- data.frame(sequence = rownames(df), df, check.names=FALSE)
  # Create an md5 sum of the sequences as ASV_ID and rearrange columns
  df$ASV_ID <- sapply(df$sequence, digest, algo='md5', serialize = FALSE)
  df <- df[,c(ncol(df),3:ncol(df)-1,1)]

  # file to publish
  write.table(df, file = "DADA2_table.tsv", sep = "\t", row.names = FALSE, quote = FALSE, na = '')

  # Write fasta file with ASV sequences to file
  write.table(data.frame(s = sprintf(">%s

%s", df$ASV_ID, df$sequence)), 'ASV_seqs.fasta', col.names = FALSE, row.names = FALSE, quote = FALSE, na = '')

  # Write ASV file with ASV abundances to file
  df$sequence <- NULL
  write.table(df, file = "ASV_table.tsv", sep="\t", row.names = FALSE, quote = FALSE, na = '')

  writeLines(c("\"NFCORE_AMPLISEQ:AMPLISEQ:DADA2_MERGE\":", paste0("    R: ", paste0(R.Version()[c("major","minor")], collapse = ".")),paste0("    dada2: ", packageVersion("dada2")) ), "versions.yml")

Command exit status: 1

Command output: (empty)

Command error: INFO: Environment variable SINGULARITYENV_TMPDIR is set, but APPTAINERENV_TMPDIR is preferred INFO: Environment variable SINGULARITYENV_NXF_TASK_WORKDIR is set, but APPTAINERENV_NXF_TASK_WORKDIR is preferred INFO: Environment variable SINGULARITYENV_NXF_DEBUG is set, but APPTAINERENV_NXF_DEBUG is preferred Error in mergeSequenceTables(tables = files, repeats = "error", orderBy = "abundance", : Some sequence tables found invalid: ./A.ASVtable.rds, ./B.ASVtable.rds In addition: Warning messages: 1: In FUN(X[[i]], ...) : Not a matrix. 2: In FUN(X[[i]], ...) : Not a matrix. Execution halted

Work dir: /cluster/projects/nn8999k/Jayalal/test_nf_core/work/4b/f5f69c5788e0dc1f904f2b360b0ba5

Tip: view the complete command output by changing to the process work dir and entering the command cat .command.out

-- Check '.nextflow.log' file for details

Note1: samplesheet.tsv sampleID forwardReads reverseReads run sample1 /data/sample1_1_L001_R1_001_1000reads.fastq.gz /data/sample1_1_L001_R2_001_1000reads.fastq.gz A sample2 /data/sample2_1_L001_R1_001_1000reads.fastq.gz /data/sample2_1_L001_R2_001_1000reads.fastq.gz B

Note_2:metadata.tsv sample-id barcode-sequence body-sites habitat fish-id exbatches

q2:types categorical categorical categorical categorical categorical

JAY01sample001 AACAAGCC:AACAAGCC SM Tank1 fish001 Batch1 JAY01sample103 TTACGCCA:TCATAGCG HF Tank1 fish001 Batch1 JAY01sample017 AATTGCCG:AATTGCCG HM Tank2 fish007 Batch3 JAY01sample117 AATTGCCG:AACAAGCC HM Tank2 fish007 Batch3

Command used and terminal output

nextflow run nf-core/ampliseq/ -profile singularity --input samplesheet.tsv --FW_primer GTGCCAGCMGCCGCGGTAA --RV_primer GGACTACHVGGGTWTCTAAT --metadata smallmetadata.tsv --outdir ./RREsults

Relevant files

No response

System information

No response

d4straub commented 7 months ago

Hi there,

this error means that the files are not in the expected format or have not the expected content. I assume you have 2 samples, originating from different sequencing runs. Maybe one sample got lost during preprocessing (e.g. too less reads and therefore produces an empty table), please check file DADA2_stats.tsv in folder /cluster/projects/nn8999k/Jayalal/test_nf_core/work/4b/f5f69c5788e0dc1f904f2b360b0ba5. Alternatively, could you check the contents in ./A.ASVtable.rds & ./B.ASVtable.rds in the before mentioned folder?

d4straub commented 7 months ago

This issue occurred with non-demultiplexed data, so it might be irrelevant. I will keep it a bit longer open but if that doesn't occur with the type of data that this pipeline is made for, then I propose to ignore it.

d4straub commented 6 months ago

I close it because there seem to appear no similar reports. Please feel free to open another issue or re-open this one in case you encounter that problem again.