nf-core / ampliseq

Amplicon sequencing analysis workflow using DADA2 and QIIME2
https://nf-co.re/ampliseq
MIT License
182 stars 115 forks source link

Ampliseq fails during reannotation of ASVs #552

Closed pragermh closed 1 year ago

pragermh commented 1 year ago

Description of the bug

Ampliseq fails when I try to re-annotate a set of ASVs in a fasta file (output.fasta) on UPPMAX/Rackham, using the latest ampliseq release: 2.5.0.

# params.yaml:
project: "[my-project]"
dada_ref_taxonomy: "sbdi-gtdb=R07-RS207-1"
input: "output.fasta"
outdir: "results"
skip_qiime: true
sbdiexport: true
FW_primer: "GTGCCAGCMGCCGCGGTAA"
RV_primer: "GGACTACHVGGGTWTCTAAT"
# output.fasta structure
>f7cdc7b06b9415c277fc6ee5d1d848a8
AAACCAGCACCTCAAGTGGTCAGGATGATTATTGGGCCTAAAGCATCCGTAGCCGGCTCTGTAAGTTTTCGGTTAAATCTGTACGCTCAA
>49241161aea308dd9d2eda85ec1dab42
AAACCAGCTCTTCAAGTGGTCGGGAATATTATTGGGCTTAAAGTGTCCGTAGCCGGTTTAGTAAGTTCCTGGTTAAATCTGGCAGCTTAA
>b886789773f06e06e310ba6a7c2832b9
AAACCAGCTCTTCAAGTGGTCGGGAATATTATTGGGCTTAAAGTGTCCGTAGCCGGTTTGATAAGTTCCTGGTTAAATCTGGCAGCTCAA

Command used and terminal output

nextflow run nf-core/ampliseq -r 2.5.0 -profile uppmax -params-file params.yaml

Caused by:
  Process `NFCORE_AMPLISEQ:AMPLISEQ:SBDIEXPORTREANNOTATE (ASV_tax_species.tsv)` terminated with an error exit status (1)

Command executed:

  if [[ 2.5.0 == *dev ]]; then
      ampliseq_version="v2.5.0, revision: be36b18b01"
  else
      ampliseq_version="v2.5.0"
  fi

  sbdiexportreannotate.R "SBDI-GTDB-R07-RS207-1 (https://scilifelab.figshare.com/articles/dataset/SBDI_Sativa_curated_16S_GTDB_database/14869077/4)" ASV_tax_species.tsv "$ampliseq_version" summary.tsv

  cat <<-END_VERSIONS > versions.yml
  "NFCORE_AMPLISEQ:AMPLISEQ:SBDIEXPORTREANNOTATE":
      R: $(R --version 2>&1 | sed -n 1p | sed 's/R version //' | sed 's/ (.*//')
  END_VERSIONS

Command exit status:
  1

Command output:
  (empty)

Command error:
  INFO:    Environment variable SINGULARITYENV_TMPDIR is set, but APPTAINERENV_TMPDIR is preferred
  INFO:    Environment variable SINGULARITYENV_NXF_DEBUG is set, but APPTAINERENV_NXF_DEBUG is preferred
  INFO:    Environment variable SINGULARITYENV_SNIC_TMP is set, but APPTAINERENV_SNIC_TMP is preferred
  WARNING: Skipping mount /var/apptainer/mnt/session/etc/resolv.conf [files]: /etc/resolv.conf doesn't exist in container
  Error: Can't join on `x$ASV_ID` x `y$ASV_ID` because of incompatible types.
  i `x$ASV_ID` is of type <factor<f2172>>>.
  i `y$ASV_ID` is of type <logical>>.
  Backtrace:
       x
    1. \-`%>%`(...)
    2.   +-base::withVisible(eval(quote(`_fseq`(`_lhs`)), env, env))
    3.   \-base::eval(quote(`_fseq`(`_lhs`)), env, env)
    4.     \-base::eval(quote(`_fseq`(`_lhs`)), env, env)
    5.       \-`_fseq`(`_lhs`)
    6.         \-magrittr::freduce(value, `_function_list`)
    7.           \-function_list[[i]](value)
    8.             +-dplyr::left_join(., predictions, by = "ASV_ID")
    9.             \-dplyr:::left_join.data.frame(., predictions, by = "ASV_ID")
   10.               \-dplyr:::join_mutate(...)
   11.                 \-dplyr:::join_rows(x_key, y_key, type = type, na_equal = na_equal)
   12.                   \-base::tryCatch(...)
   13.                     \-base:::tryCatchList(expr, classes, parentenv, handlers)
   14.                       \-base:::tryCatchOne(expr, names, parentenv, handlers[[1L]])
   15. 
  Execution halted

Relevant files

nextflow.log

System information

2.5.0 Run on UPPMAX/Rackham using UPPMAX profile

jtangrot commented 1 year ago

The problem seems to be that barrnap is not able to predict anything for these sequences, leading to a summary.tsv file that is empty except for the header. This makes bin/sbdiexportreannotate.R crash. I can fix bin/sbdiexportreannotate.R so it can handle an empty barrnap file, but was wondering if it's better to not create a summary file at all if the barrnap gff files are all empty? Or is there a better way to handle this?

pragermh commented 1 year ago

Re-running analysis with skip_barrnap: true now, to help confirm your conclusion.

pragermh commented 1 year ago

Run with skip_barrnap: true finished Mar-07 12:29:44.043 [main] INFO nextflow.Nextflow - -[nf-core/ampliseq] Pipeline completed successfully- but I still don't get any SBDI export files.

jtangrot commented 1 year ago

Run with skip_barrnap: true finished Mar-07 12:29:44.043 [main] INFO nextflow.Nextflow - -[nf-core/ampliseq] Pipeline completed successfully- but I still don't get any SBDI export files.

The SBDI-export process is not run if barrnap is skipped - the two options should probably not be allowed at the same time...

jtangrot commented 1 year ago

The problem seems to be that barrnap is not able to predict anything for these sequences, leading to a summary.tsv file that is empty except for the header. This makes bin/sbdiexportreannotate.R crash. I can fix bin/sbdiexportreannotate.R so it can handle an empty barrnap file, but was wondering if it's better to not create a summary file at all if the barrnap gff files are all empty? Or is there a better way to handle this?

@d4straub , @erikrikarddaniel Do you have any comments on this?

d4straub commented 1 year ago

My opinion:

erikrikarddaniel commented 1 year ago

I agree with what @d4straub says, and I think the SBDI export should work in line with that, i.e. not fail when no matches are found. It will always fail when the amplicon is not SSU rRNA.

d4straub commented 1 year ago

So that seems fixed in dev?

jtangrot commented 1 year ago

So that seems fixed in dev?

Yes, fixed in PR #553