nf-core / ampliseq

Amplicon sequencing analysis workflow using DADA2 and QIIME2
https://nf-co.re/ampliseq
MIT License
182 stars 115 forks source link

Fasta input isnt compatible with most filters #629

Closed d4straub closed 1 year ago

d4straub commented 1 year ago

Description of the bug

nextflow run nf-core/ampliseq -r dev -profile test_fasta,singularity --outdir results_fasta -resume --max_len_asv 265 fails with:

ERROR ~ Error executing process > 'NFCORE_AMPLISEQ:AMPLISEQ:PHYLOSEQ_WORKFLOW:PHYLOSEQ (dada2)'

Caused by:
  Process `NFCORE_AMPLISEQ:AMPLISEQ:PHYLOSEQ_WORKFLOW:PHYLOSEQ (dada2)` terminated with an error exit status (1)

Command executed:

  #!/usr/bin/env Rscript

  suppressPackageStartupMessages(library(phyloseq))

  otu_df  <- read.table("ASV_table.len.tsv", sep="\t", header=TRUE, row.names=1)
  tax_df  <- read.table("ASV_tax_species.rdp_18.tsv", sep="\t", header=TRUE, row.names=1)
  otu_mat <- as.matrix(otu_df)
  tax_mat <- as.matrix(tax_df)

  OTU     <- otu_table(otu_mat, taxa_are_rows=TRUE)
  TAX     <- tax_table(tax_mat)
  phy_obj <- phyloseq(OTU, TAX)

  if (file.exists("")) {
      sam_df  <- read.table("", sep="\t", header=TRUE, row.names=1)
      SAM     <- sample_data(sam_df)
      phy_obj <- merge_phyloseq(phy_obj, SAM)
  }

  if (file.exists("")) {
      TREE    <- read_tree("")
      phy_obj <- merge_phyloseq(phy_obj, TREE)
  }

  saveRDS(phy_obj, file = paste0("dada2", "_phyloseq.rds"))

  # Version information
  writeLines(c("\"NFCORE_AMPLISEQ:AMPLISEQ:PHYLOSEQ_WORKFLOW:PHYLOSEQ\":",
      paste0("    R: ", paste0(R.Version()[c("major","minor")], collapse = ".")),
      paste0("    phyloseq: ", packageVersion("phyloseq"))),
      "versions.yml"
  )

Command exit status:
  1

Command output:
  (empty)

Command error:
  Error in validObject(.Object) : invalid class “otu_table” object: 
   OTU abundance data must have non-zero dimensions.
  Calls: otu_table ... .nextMethod -> callNextMethod -> .nextMethod -> validObject
  Execution halted

Additionally, when using nextflow run nf-core/ampliseq -r dev -profile test_fasta,singularity --outdir results_fasta -resume --vsearch_cluster VSEARCH doesnt cluster anything, the process simply doesnt run, but no error.

Also, with the following settings the pipeline doesnt classify (no error!): nextflow run nf-core/ampliseq -r dev -profile test_fasta,singularity --outdir results_fasta -resume --filter_ssu bac nextflow run nf-core/ampliseq -r dev -profile test_fasta,singularity --outdir results_fasta -resume --filter_codons

Command used and terminal output

No response

Relevant files

No response

System information

current dev, but nextflow run nf-core/ampliseq -r dev -profile test_fasta,singularity --outdir results_fasta -resume --filter_ssu bac nextflow run nf-core/ampliseq -r dev -profile test_fasta,singularity --outdir results_fasta -resume --filter_codons should also not work as expected with version 2.6.1 (but not tested).

erikrikarddaniel commented 1 year ago

To address the first error, one could use Nextflow as a macro language instead of using R's file.exists("") and place the conditional processing in a Nextflow block before the R script.

Example pseudocode, assuming a Nextflow variable otu_table that contains the path to an otu table:

script:

read_otu_table = otu_table.exists() ? "otu_table <- read_otu_table('${otu_table}') : ""

...

"""
$read_otu_table
"""
d4straub commented 1 year ago

Thanks for the idea, I fixed it in the draft PR above on nextflow level by making the output optional and therefore output an empty channel in case of fasta file input. Works so far.