Fasta input isnt compatible with most filters

d4straub commented 1 year ago

Description of the bug

nextflow run nf-core/ampliseq -r dev -profile test_fasta,singularity --outdir results_fasta -resume --max_len_asv 265 fails with:

ERROR ~ Error executing process > 'NFCORE_AMPLISEQ:AMPLISEQ:PHYLOSEQ_WORKFLOW:PHYLOSEQ (dada2)'

Caused by:
  Process `NFCORE_AMPLISEQ:AMPLISEQ:PHYLOSEQ_WORKFLOW:PHYLOSEQ (dada2)` terminated with an error exit status (1)

Command executed:

  #!/usr/bin/env Rscript

  suppressPackageStartupMessages(library(phyloseq))

  otu_df  <- read.table("ASV_table.len.tsv", sep="\t", header=TRUE, row.names=1)
  tax_df  <- read.table("ASV_tax_species.rdp_18.tsv", sep="\t", header=TRUE, row.names=1)
  otu_mat <- as.matrix(otu_df)
  tax_mat <- as.matrix(tax_df)

  OTU     <- otu_table(otu_mat, taxa_are_rows=TRUE)
  TAX     <- tax_table(tax_mat)
  phy_obj <- phyloseq(OTU, TAX)

  if (file.exists("")) {
      sam_df  <- read.table("", sep="\t", header=TRUE, row.names=1)
      SAM     <- sample_data(sam_df)
      phy_obj <- merge_phyloseq(phy_obj, SAM)
  }

  if (file.exists("")) {
      TREE    <- read_tree("")
      phy_obj <- merge_phyloseq(phy_obj, TREE)
  }

  saveRDS(phy_obj, file = paste0("dada2", "_phyloseq.rds"))

  # Version information
  writeLines(c("\"NFCORE_AMPLISEQ:AMPLISEQ:PHYLOSEQ_WORKFLOW:PHYLOSEQ\":",
      paste0("    R: ", paste0(R.Version()[c("major","minor")], collapse = ".")),
      paste0("    phyloseq: ", packageVersion("phyloseq"))),
      "versions.yml"
  )

Command exit status:
  1

Command output:
  (empty)

Command error:
  Error in validObject(.Object) : invalid class “otu_table” object: 
   OTU abundance data must have non-zero dimensions.
  Calls: otu_table ... .nextMethod -> callNextMethod -> .nextMethod -> validObject
  Execution halted

Additionally, when using nextflow run nf-core/ampliseq -r dev -profile test_fasta,singularity --outdir results_fasta -resume --vsearch_cluster VSEARCH doesnt cluster anything, the process simply doesnt run, but no error.

Also, with the following settings the pipeline doesnt classify (no error!): nextflow run nf-core/ampliseq -r dev -profile test_fasta,singularity --outdir results_fasta -resume --filter_ssu bac nextflow run nf-core/ampliseq -r dev -profile test_fasta,singularity --outdir results_fasta -resume --filter_codons

Command used and terminal output

No response

Relevant files

No response

System information

current dev, but nextflow run nf-core/ampliseq -r dev -profile test_fasta,singularity --outdir results_fasta -resume --filter_ssu bac nextflow run nf-core/ampliseq -r dev -profile test_fasta,singularity --outdir results_fasta -resume --filter_codons should also not work as expected with version 2.6.1 (but not tested).

erikrikarddaniel commented 1 year ago

To address the first error, one could use Nextflow as a macro language instead of using R's file.exists("") and place the conditional processing in a Nextflow block before the R script.

Example pseudocode, assuming a Nextflow variable otu_table that contains the path to an otu table:

script:

read_otu_table = otu_table.exists() ? "otu_table <- read_otu_table('${otu_table}') : ""

...

"""
$read_otu_table
"""

d4straub commented 1 year ago

Thanks for the idea, I fixed it in the draft PR above on nextflow level by making the output optional and therefore output an empty channel in case of fasta file input. Works so far.

nf-core / ampliseq