marbl / canu

A single molecule sequence assembler for genomes large and small.
http://canu.readthedocs.io/
654 stars 179 forks source link

Workflow and Indels in amplicon contigs #1322

Closed chytrids closed 5 years ago

chytrids commented 5 years ago

Hello, I am using Canu v1.8 on the Flux Linux-based operating environment at the University of Michigan to assemble contigs of demultiplexed amplicons generated from fungal genomic DNA using a Nanopore MinION. Depending on the species of fungus, the amplicon can range from 4.5 kbp to 6kbp. After filtering and demultiplexing, I have been following this workflow:

  1. Attempting default settings:

    canu \
    -p $myprefix -d $myoutputdirectory \
    genomeSize=5000 -nanopore-raw $myfastafile \
    useGrid=false "canuIteration=1" canuIteration=1
  2. If (1) failed, adding:

    corOutCoverage=200 correctedErrorRate=0.15
  3. If (2) failed, adding:

    corMhapSensitivity=normal
  4. If (3) failed, modifying (3) to:

    corMhapSensitivity=high corMinCoverage=0

Would this be a recommended strategy? When moving to steps (3) or (4), should I be removing the parameters added in (2)?

The only problem that I can see from contigs.fasta files produced at different steps in the workflow is that there seems to be a handful of indels that differ between them. Indel differences are also noticeable when comparing our contigs to reference sequences. Is there a recommendation for obtaining the best quality sequences in consideration of indels?

Thank you for your assistance and attention.

skoren commented 5 years ago

I think this is a reasonable approach, you don't need to remove parameters from step2 in 3 & 4. They should be compatible.

As for consensus, you need to run a polishing tool to improve the indel rate, either medaka or nanopolish should do a reasonable job.