Closed tseemann closed 8 years ago
I've added a bit of information on the read threading wiki page.
If you don't merge the paired end reads when they overlap you'll see very few read pairs have their insert gaps filled. This means you may lose a lot of long distance connectivity information that is in the reads. In some cases it may increase the rate of errors in your graph links.
If you have a lot of overlapping read pairs and you can't merge them, I recommend only using single ended reads in the threading stage. This will reduce your contig N50 but you'll make fewer assembly mistakes.
It's always possible to merge them using pear
or FLASH
etc, and end up with some unmerged PE and the rest in merged SE reads. I guess I worry about PE merging with respect to short exact tandem repeats (eg. CRISPR style).
Thanks for updating the wiki! Your docs are very thorough and they are helping me a lot to understand how to make use of mccortex
.
Piggy backing on this issue, do you have any intuition or benchmark data for how mccortex might perform as a single genome assembler, for example in comparison with spades? I looked in the benchmark folder and this appears focused around mixed samples, unless I have interpreted this wrong.
I'm impressed by how cleanly
mccortex
installed and runs!I got these warnings in
thread
We get lots of overlapping PE reads from NextSeq and MiSeq due to suboptimal Nextera XT library prep.
Should I be concerned? Will this affect the results?