lh3 / miniasm

Ultrafast de novo assembly for long noisy reads (though having no consensus step)
MIT License
297 stars 68 forks source link

using a fragmented genome as an input #68

Open dcopetti opened 5 years ago

dcopetti commented 5 years ago

Hello,

I am dealing with a complex plant genome (higly heterozygous, haploid genome size 2.5 Gb, I have a diploid assembly of 5 Gb, N50 200 kb, N90 5 kb) and I would like to extend its contiguity with ONT reads. I thought of using the assembly (has scaffolds, but only 1.5% Ns) as anonther input with the ONT data in the minimap stage, to create extensions of the scaffolds. The ONT data (20x of 5 Gb, N50 9 kb) could be selected to have reads only above 2 kb, for example.

My question is whether the minimap2 alignment or graph construction steps will be affected by having two types of data: highly-accurate scaffolds and ~85% accurate ONT reads. Do you think this will be a good strategy?

Basically, the scaffolds (highly accurate at the nt level, but of coverage 1x) will be a baseline set of sequences to be extended with the longer (with coverage) ONT reads. Do you think it is worth a try? Thanks,

Dario