Should unpolished contigs be used in further analysis?

isovic / racon

Ultrafast consensus module for raw de novo genome assembly of long uncorrected reads. http://genome.cshlp.org/content/early/2017/01/18/gr.214270.116 Note: This was the original repository which will no longer be officially maintained. Please use the new official repository here:

https://github.com/lbcb-sci/racon

MIT License

272 stars 49 forks source link

Should unpolished contigs be used in further analysis? #104

Open aleksandrabliznina opened 5 years ago

aleksandrabliznina commented 5 years ago

Hi!

Thank you for the amazing tool! It's very helpful!!! Unfortunately, I am struggling with a question and can not find the right answer myself. I have Oxford Nanopore reads that I assemble with Canu into contigs. As a next step, I polish these contigs with corrected and trimmed reads that Canu produces using Racon in 3 iterations. So far, I did not have any significant reductions in the number of contigs (for example, only 17 contigs remain unpolished out of 383 contigs); this I consider as "fine" and do not take these 17 contigs in further analysis. But with the last assembly that I did for the same organism but another individual, I got a significant reduction: 189 contigs out of 428 were filtered out as unpolished (in terms of genome size they do not occupy a lot - less than 10%). I understand that I can easily get them back with "-u" option. But the question is if I should keep them at all? What is your experience with this kind of problem?

Thank you very much for your help!

rvaser commented 5 years ago

Hello Aleksandra, unpolished sequences are those that do not have a region covered with at least 3 other reads, which usually means that they are either of poor quality and the mapper is unable map anything to them, or they are contained in some other contig and majority of reads has a better overlap with the bigger contig (only the best mapping per read is taken for consensus). You can check the min, max, avg/median size of the dropped sequences and if they are really small you could just drop them. You can also check if they are contained in any other contig. Or you can try using all read to contig mappings and check the number of dropped contigs (option -f in racon). Whichever seems the easiest. I am not sure whether to keep or drop the unpolished sequence.

Best regards, Robert

pjm43 commented 5 years ago

Hi, Sorry for jumping in on this thread, but similar to Aleksandra, I have an ONT-based canu assembly and I'm wondering if I should be using racon with the original raw ONT reads or if I should actually be using the corrected or trimmed reads that canu produces. I assumed it should be the raw reads as the corrected reads and trimmed reads only represent a subset of the raw reads (I think the default for canu is to select the largest 35-40X coverage set). Any insight would be helpful! Thanks, Jeff

rvaser commented 5 years ago

Hi Jeff, I have not tried using corrected/trimmer reads in polishing, usually we use raw reads when it comes to TGS. Could you try both and compare the outcome?

Best regards, Robert

Rohit-Satyam commented 5 months ago

I have the same question like which one to use corrected or raw read. Did you guys concluded anything?