marbl / canu

A single molecule sequence assembler for genomes large and small.
http://canu.readthedocs.io/
659 stars 179 forks source link

Heterozygosity parameters running time #660

Closed uceleste closed 7 years ago

uceleste commented 7 years ago

Dear All,

I would like to carry out two assembly test with Canu using both the parameters sets for:

If I understood well, the error rate used for "Smash haplotypes together" is three-fold higher (15%) than the default (5%). This means that the time of the assembly process will likely be three-time longer? And the running time of the process "Avoid collapsing the genome" instead?

I'm not complaining about the running time of this fantastic assembler. I only need these "time estimates" for some future planning. Many thanks!

skoren commented 7 years ago

The smash haplotypes will definitely be slower, I would expect 8-10-fold not 3-fold. Honestly, I rarely use these parameters because of the runtime and because they still won't collapse large structural variations (basically anything like an inversion/etc where there is no reasonable way to call a consensus of the variants). There is one genome I can remember where the heterozygosity was localized and simple differences where we used these settings.

The avoid collapsing runtime should be comparable to or faster than the defaults.

uceleste commented 7 years ago

Perfect, thank you very much for the answer. I will definitely take this information into account. Anyway, Canu is really a great assembler. Keep it up!