nf-core / ampliseq

Amplicon sequencing analysis workflow using DADA2 and QIIME2
https://nf-co.re/ampliseq
MIT License
188 stars 118 forks source link

Error executing process > 'make_SILVA_132_16S_classifier (1) #73

Closed alneberg closed 5 years ago

alneberg commented 5 years ago

Hello!

I posted this on the slack channel a few days ago but without response, so I'll try my luck here instead.

I’m having this trouble with the make_SILVA_132_16S_classifier:

ERROR ~ Error executing process > 'make_SILVA_132_16S_classifier (1)'

Caused by:
  Process `make_SILVA_132_16S_classifier (1)` terminated with an error exit status (1)

Command executed:

  unzip -qq Silva_132_release.zip

          fasta="SILVA_132_QIIME_release/rep_set/rep_set_16S_only/99/silva_132_99_16S.fna"
          taxonomy="SILVA_132_QIIME_release/taxonomy/16S_only/99/consensus_taxonomy_7_levels.txt"

          if [ "false" = "true" ]; then
                    sed 's/#//g' $taxonomy >taxonomy-99_removeHash.txt
                    taxonomy="taxonomy-99_removeHash.txt"
                    echo "
  ######## WARNING! The taxonomy file was altered by removing all hash signs!"
          fi

            ### Import
            qiime tools import --type 'FeatureData[Sequence]'           --input-path $fasta             --output-path ref-seq-99.qza
            qiime tools import --type 'FeatureData[Taxonomy]'           --source-format HeaderlessTSVTaxonomyFormat             --input-path $taxonomy          --output-path ref-taxonomy-99.qza

            #Extract sequences based on primers
            qiime feature-classifier extract-reads              --i-sequences ref-seq-99.qza            --p-f-primer ACACTCTTTCCCTACACGACGCTCTTCCGATCTCCTACGGGNGGCWGCAG                 --p-r-primer AGACGTGTGCTCTTCCGATCTGACTACHVGGGTATCTAATCC                 --o-reads ACACTCTTTCCCTACACGACGCTCTTCCGATCTCCTACGGGNGGCWGCAG-AGACGTGTGCTCTTCCGATCTGACTACHVGGGTATCTAATCC-99-ref-seq.qza         --quiet

            #Train classifier
            qiime feature-classifier fit-classifier-naive-bayes                 --i-reference-reads ACACTCTTTCCCTACACGACGCTCTTCCGATCTCCTACGGGNGGCWGCAG-AGACGTGTGCTCTTCCGATCTGACTACHVGGGTATCTAATCC-99-ref-seq.qza                --i-reference-taxonomy ref-taxonomy-99.qza              --o-classifier ACACTCTTTCCCTACACGACGCTCTTCCGATCTCCTACGGGNGGCWGCAG-AGACGTGTGCTCTTCCGATCTGACTACHVGGGTATCTAATCC-99-classifier.qza         --quiet

Command exit status:
  1

Command output:
  (empty)

Command error:
  QIIME is caching your current deployment for improved performance. This may take a few moments and should only happen once per deployment.
  Plugin error from feature-classifier:

    No matches found

  Debug info has been saved to /scratch/8019909/qiime2-q2cli-err-ovnpz7tu.log

Work dir:
  /crex/proj/sllstore2017079/private/johannes/user_analysis/ampliseq/A.Andersson_18_03/work/74/46a9a6edf7078cadb233bf933efff3

Tip: you can replicate the issue by changing to the process work dir and entering the command `bash .command.run`

I guess this could be due to my data, but I am not able to rerun this step in the work dir. Then I get the error:

$ bash .command.run
replace SILVA_132_QIIME_release/core_alignment/80_core_alignment.fna? [y]es, [n]o, [A]ll, [N]one, [r]ename:  NULL
(EOF or read error, treating as "[N]one" ...)

Any assistance on how to move forward is greatly appreciated.

d4straub commented 5 years ago

Hi,

the error message means that there are no matches in the database to the primer sequences you provided. The sequences you provided with --FW_primer and --RV_primer are part of the library preparation and contain addtional (adapter) sequences that aren't part of the database that contains 16S rRNA gene sequences.

The primers that were initially used for PCR might be 341 (5′-CCTACGGGNGGCWGCAG-3′) and 805 (5′-GACTACHVGGGTATCTAATCC-3′), but you might want to confirm that.

In general when primer sequences are >25 bp they are likely not the original primers, usual length are 19-21 bp.

Best wishes

alneberg commented 5 years ago

Yes, that's most likely the case! Thank you! Do you think it would be useful to clarify this further in the documentation? I guess I was confused by the documentation since the adapters also need to be trimmed off of the reads no?

d4straub commented 5 years ago

All adapter sequences that might be before the primer sequence are removed automatically since the pipeline is using cutadapt's parameters -g / -G and all untrimmed sequences (meaning that do not contain the primer sequence) are discarded by default (this is absolute recommended!).

Since I am originally a wet-lab researcher, for me the description in the docs is unambiguous. PCR on total DNA to produce the 16S rRNA gene amplicons or to perform additional PCR during library prep are very different things to me. But if you could come up with a better description for the docs than I'll incorporate that.

alneberg commented 5 years ago

You're probably right, it might be only me that has this problem. Leave it as it is and see if others end up with the same problem.

Thanks!

Vikash84 commented 4 years ago

Error executing process > 'make_SILVA_132_16S_classifier (1)'