lh3 / miniasm

Ultrafast de novo assembly for long noisy reads (though having no consensus step)
MIT License
296 stars 68 forks source link

max divergence optimization #54

Open EarlyEvol opened 5 years ago

EarlyEvol commented 5 years ago

Thanks for the super fast aligner/assembler. I know Miniasm is not your focus and appreciate any time you spend on support and questions.

I have been messing around with some parameters with miniasm and found that the default max divergence is 0.05. Since raw PacBio/Nanopore reads are only about 0.85 accurate, wouldn't pairwise alignments between reads have only 0.70 identity (0.30 divergent)? Is the 0.05 just tuning to a peculiarity of the divergence estimator of Minimap2?

Here is some info from a couple of assemblies I have done with MIniasm. For a 460mb genome with 10 chromosomes, miniasm with default parameters generated an assembly with N50 of 2.0mb from 75X PacBio data (read N50 ~20kb). After seeing the max divergence default of 0.05, I tried "-s 5000 -m 1000 -i 0.10" and the N50 jumped to 3.5mb. I thought even if 0.10 divergence allowed spurious alignments to get counted, longer s and m args would help prevent false joins. Interestingly, changing -s to 7500 increased N50 to 3.68mb.

Have you messed around with these parameters and checked for assembly accuracy? I have a Canu assembly which has an N50 of 9mb which I will compare these miniasm contigs to after polishing.

Best, Earl