An issue when use --Q2import

xkcococo commented 3 years ago

Hi, I have trouble when using --Q2imported. The value for --Q2imported is demux.qza that generated from --untilQ2import.

The error message is Invalid characters in sequence: ['a', 'e', 'f', 'l', 's'].
    Valid characters: ['B', 'D', 'N', 'T', 'A', 'K', 'C', 'Y', 'V', 'W', 'S', '-', 'G', '.', 'H', 'M', 'R']
    Note: Use `lowercase` if your sequence contains lowercase characters not in the sequence's alphabet.

How could I fix this issue?

Thanks in advance.

xkcococo commented 3 years ago

Error executing process > 'make_SILVA_132_16S_classifier (1)'

Caused by:
  Process `make_SILVA_132_16S_classifier (1)` terminated with an error exit status (1)

Command executed:

  export HOME="${PWD}/HOME"

        unzip -qq Silva_132_release.zip

        fasta="SILVA_132_QIIME_release/rep_set/rep_set_16S_only/99/silva_132_99_16S.fna"
        taxonomy="SILVA_132_QIIME_release/taxonomy/16S_only/99/consensus_taxonomy_7_levels.txt"

        if [ "false" = "true" ]; then
            sed 's/#//g' $taxonomy >taxonomy-99_removeHash.txt
            taxonomy="taxonomy-99_removeHash.txt"
            echo "
  ######## WARNING! The taxonomy file was altered by removing all hash signs!"
        fi

        ### Import
        qiime tools import --type 'FeatureData[Sequence]'           --input-path $fasta             --output-path ref-seq-99.qza
        qiime tools import --type 'FeatureData[Taxonomy]'           --input-format HeaderlessTSVTaxonomyFormat          --input-path $taxonomy          --output-path ref-taxonomy-99.qza

        #Extract sequences based on primers
        qiime feature-classifier extract-reads          --i-sequences ref-seq-99.qza            --p-f-primer false          --p-r-primer false          --o-reads false-false-99-ref-seq.qza            --quiet

        #Train classifier
        qiime feature-classifier fit-classifier-naive-bayes             --i-reference-reads false-false-99-ref-seq.qza          --i-reference-taxonomy ref-taxonomy-99.qza          --o-classifier false-false-99-classifier.qza            --quiet

Command exit status:
  1

Command output:
  Imported SILVA_132_QIIME_release/rep_set/rep_set_16S_only/99/silva_132_99_16S.fna as DNASequencesDirectoryFormat to ref-seq-99.qza
  Imported SILVA_132_QIIME_release/taxonomy/16S_only/99/consensus_taxonomy_7_levels.txt as HeaderlessTSVTaxonomyFormat to ref-taxonomy-99.qza

Command error:
  QIIME is caching your current deployment for improved performance. This may take a few moments and should only happen once per deployment.
  Plugin error from feature-classifier:

    Invalid characters in sequence: ['a', 'e', 'f', 'l', 's']. 
    Valid characters: ['B', 'D', 'N', 'T', 'A', 'K', 'C', 'Y', 'V', 'W', 'S', '-', 'G', '.', 'H', 'M', 'R']
    Note: Use `lowercase` if your sequence contains lowercase characters not in the sequence's alphabet.

  Debug info has been saved to /scratch/local/63726991/qiime2-q2cli-err-grfhun63.log

d4straub commented 3 years ago

Hi there, this is a strange error message.

Hi, I have trouble when using --Q2imported. The value for --Q2imported is demux.qza that generated from --untilQ2import.

After you have run with --Q2imported, please continue with running the same command but omitting --Q2imported and adding -resume, as described here for --multipleSequencingRun. But the problem might be somewhere else.

About a similar issue with process make_SILVA_132_16S_classifier, that is surprising. What command did you run? I could imagine that either the database was not the expected default one or probably there is something off with the software management (conda/singularity/...).

xkcococo commented 3 years ago

Hi there, this is a strange error message.

Hi, I have trouble when using --Q2imported. The value for --Q2imported is demux.qza that generated from --untilQ2import.

After you have run with --Q2imported, please continue with running the same command but omitting --Q2imported and adding -resume, as described here for --multipleSequencingRun. But the problem might be somewhere else.

About a similar issue with process make_SILVA_132_16S_classifier, that is surprising. What command did you run? I could imagine that either the database was not the expected default one or probably there is something off with the software management (conda/singularity/...).

Thanks for help! The previous code I run is: nextflow run ampliseq -profile singularity --input seq --FW_primer TCGTCGGCAGCGTCAGATGTGTATAAGAGACA --RV_primer GTCTCGTGGGCTCGGAGATC --metadata "Metadata.tsv" --r 1.1.3 --untilQ2import

And then I run nextflow run ampliseq -profile singularity --r 1.1.3 --Q2imported demux.qza

d4straub commented 3 years ago

Alright. There are some oddities. First, use nf-core/ampliseq instead of ampliseq, the pipeline is automatically managed by nextflow that way, while when using only ampliseq, you are using a local copy that has to be downloaded in advance (which isnt good if you can avoid it, also, you could have modified it). Second, your command 2 cannot work, because it has no primer sequences, no metadata, etc. Additionally, you are not supplying the pipeline with the cutoffs in command 2 that you are supposed to choose with command 1. Also, your --r does nothing, because the option is -r (one less -).

I expect you have an internet connection with this machine. Please use only these two command, you neither need to download anything before nor change any parameter. Please also mind the additional ".

nextflow pull nf-core/ampliseq -r 1.1.3
nextflow run nf-core/ampliseq -profile singularity --input "seq" --FW_primer "TCGTCGGCAGCGTCAGATGTGTATAAGAGACA" --RV_primer "GTCTCGTGGGCTCGGAGATC" --metadata "Metadata.tsv" -r 1.1.3 --trunc_qmin 30

--trunc_qmin 30 will make the pipeline choose required cutoffs automatically. Please make sure to read the description and the help message.

The error message might be due to the attempt to use SILVA v138 instead of 132? 138 isnt supported yet. There is an update in the making to this.

xkcococo commented 3 years ago

@d4straub Thanks!

The error message was shown when I only use the default classifier.

I noticed you did not use --Q2imported demux.qza in the second command. If I want to use the qza file that has already been generated from mu first command, do I only need to use nextflow run nf-core/ampliseq -profile singularity --input "seq" --FW_primer "TCGTCGGCAGCGTCAGATGTGTATAAGAGACA" --RV_primer "GTCTCGTGGGCTCGGAGATC" --metadata "Metadata.tsv" -r 1.1.3 --trunc_qmin 30 or nextflow run nf-core/ampliseq -profile singularity --input "seq" --FW_primer "TCGTCGGCAGCGTCAGATGTGTATAAGAGACA" --RV_primer "GTCTCGTGGGCTCGGAGATC" --metadata "Metadata.tsv" -r 1.1.3 --trunc_qmin 30 --Q2imported demux.qza

Thanks!

d4straub commented 3 years ago

If you do nextflow run nf-core/ampliseq -profile singularity --input "seq" --FW_primer "TCGTCGGCAGCGTCAGATGTGTATAAGAGACA" --RV_primer "GTCTCGTGGGCTCGGAGATC" --metadata "Metadata.tsv" -r 1.1.3 --trunc_qmin 30 than no further action/command is needed, it will analyze the data from the start to the finish line.

If you do want to use an already produced demux.qza than yes, adding --Q2imported "demux.qza" (note the ", might be not always needed, but prevents sometimes errors) will continue from that part. But this is totally unnecessary to stop/continue at that point. You would only pause at the "demux.qza" in case you want to determine --trunclenf and --trunclenf by visually inspecting results/demux/index.html (read about that two parameters here), but you dont do this, you use the automatic determination of these values at the moment (which is fine most likely). So there is no need for you to use --untilQ2import and --Q2imported!

xkcococo commented 3 years ago

@d4straub Thanks! The command line I used for this time is

nextflow pull nf-core/ampliseq -r 1.1.3
nextflow run nf-core/ampliseq -profile singularity -r 1.1.3 --input "seq_momstool" --FW_primer "TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGCCTACGGGNGGCWGCAG" --RV_primer "GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGGACTACHVGGGTATCTAATCC" --metadata "Metadata_mom_stool.tsv" --trunc-qmin 30 --max_memory '48.GB' --max_cpus 4

Now I use this command and get an error

Error executing process > 'make_SILVA_132_16S_classifier (1)'

Caused by:
  Process `make_SILVA_132_16S_classifier (1)` terminated with an error exit status (1)

Command executed:

  export HOME="${PWD}/HOME"

        unzip -qq Silva_132_release.zip

        fasta="SILVA_132_QIIME_release/rep_set/rep_set_16S_only/99/silva_132_99_16S.fna"
        taxonomy="SILVA_132_QIIME_release/taxonomy/16S_only/99/consensus_taxonomy_7_levels.txt"

        if [ "false" = "true" ]; then
            sed 's/#//g' $taxonomy >taxonomy-99_removeHash.txt
            taxonomy="taxonomy-99_removeHash.txt"
            echo "
  ######## WARNING! The taxonomy file was altered by removing all hash signs!"
        fi

        ### Import
        qiime tools import --type 'FeatureData[Sequence]'           --input-path $fasta             --output-path ref-seq-99.qza
        qiime tools import --type 'FeatureData[Taxonomy]'           --input-format HeaderlessTSVTaxonomyFormat          --input-path $taxonomy          --output-path ref-taxonomy-99.qza

        #Extract sequences based on primers
        qiime feature-classifier extract-reads          --i-sequences ref-seq-99.qza            --p-f-primer TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGCCTACGGGNGGCWGCAG             --p-r-primer GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGGACTACHVGGGTATCTAATCC            --o-reads TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGCCTACGGGNGGCWGCAG-GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGGACTACHVGGGTATCTAATCC-99-ref-seq.qza             --quiet

        #Train classifier
        qiime feature-classifier fit-classifier-naive-bayes             --i-reference-reads TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGCCTACGGGNGGCWGCAG-GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGGACTACHVGGGTATCTAATCC-99-ref-seq.qza           --i-reference-taxonomy ref-taxonomy-99.qza          --o-classifier TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGCCTACGGGNGGCWGCAG-GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGGACTACHVGGGTATCTAATCC-99-classifier.qza             --quiet

Command exit status:
  1

Command output:
  Imported SILVA_132_QIIME_release/rep_set/rep_set_16S_only/99/silva_132_99_16S.fna as DNASequencesDirectoryFormat to ref-seq-99.qza
  Imported SILVA_132_QIIME_release/taxonomy/16S_only/99/consensus_taxonomy_7_levels.txt as HeaderlessTSVTaxonomyFormat to ref-taxonomy-99.qza

Command error:
  QIIME is caching your current deployment for improved performance. This may take a few moments and should only happen once per deployment.
  Plugin error from feature-classifier:

    No matches found

  Debug info has been saved to /scratch/local/63778170/qiime2-q2cli-err-9fdp9pcb.log

Work dir:
  /blue/djlemas/share/data/MySequences/microbiome-seq/afog16s-mom-stool/work/73/ec44c6d41e8e0ca8e27dbd3c3395f6

Tip: when you have fixed the problem you can continue the execution adding the option `-resume` to the run command line

It stopped at make_SILVA_132_16S_classifier. Could you help me to fix it?

Thank you very much!!

d4straub commented 3 years ago

--FW_primer "TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGCCTACGGGNGGCWGCAG" --RV_primer "GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGGACTACHVGGGTATCTAATCC" are not correct. Primer sequences are used (1) to trim raw reads and (2) to extract sequences from the classifier. The primer sequences from the 1st step PCR (are typically 17-25bp long) are supposed to be given here, not the primers of the 2nd step PCR (contain Illumina adapters, >30bp long). You provided primer sequences of the 2nd step PCR and therefore the workflow fails because these sequences (Illumina adapters) are not present in the classifier sequences.

Solution: use 1st step PCR primers, that might be primers 341f CCTACGGGNGGCWGCAG and 805r GACTACHVGGGTATCTAATCC --> --FW_primer "CCTACGGGNGGCWGCAG" --RV_primer "GACTACHVGGGTATCTAATCC" Please make sure that this is correct.

This region is quite long (805-341 = 464bp) and is just so covered by a Miseq PE 250bp run, but well covered by a PE300 run. Still, such a long region might be incompatible with --trunc-qmin 30. So please check DADA2 output file results/abundance-table/unfiltered/dada_stats.tsv and if you loose between columns denoised and merged more than 30% of reads, than probably use instead of --trunc-qmin 30 the options --trunclenf 247 --trunclenr 238 -resume to increase the change the forward and reverse reads can be merged (and resume the old run so to minimize new calculations).

xkcococo commented 3 years ago

--FW_primer "TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGCCTACGGGNGGCWGCAG" --RV_primer "GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGGACTACHVGGGTATCTAATCC" are not correct. Primer sequences are used (1) to trim raw reads and (2) to extract sequences from the classifier. The primer sequences from the 1st step PCR (are typically 17-25bp long) are supposed to be given here, not the primers of the 2nd step PCR (contain Illumina adapters, >30bp long). You provided primer sequences of the 2nd step PCR and therefore the workflow fails because these sequences (Illumina adapters) are not present in the classifier sequences.

Solution: use 1st step PCR primers, that might be primers 341f CCTACGGGNGGCWGCAG and 805r GACTACHVGGGTATCTAATCC --> --FW_primer "CCTACGGGNGGCWGCAG" --RV_primer "GACTACHVGGGTATCTAATCC" Please make sure that this is correct.

This region is quite long (805-341 = 464bp) and is just so covered by a Miseq PE 250bp run, but well covered by a PE300 run. Still, such a long region might be incompatible with --trunc-qmin 30. So please check DADA2 output file results/abundance-table/unfiltered/dada_stats.tsv and if you loose between columns denoised and merged more than 30% of reads, than probably use instead of --trunc-qmin 30 the options --trunclenf 247 --trunclenr 238 -resume to increase the change the forward and reverse reads can be merged (and resume the old run so to minimize new calculations).

Thanks! I will try to figure out what's my 1st step PCR primers and then run again.

d4straub commented 3 years ago

This seems solved, otherwise just open it again.

nf-core / ampliseq

An issue when use --Q2import #218