isovic / racon

Ultrafast consensus module for raw de novo genome assembly of long uncorrected reads. http://genome.cshlp.org/content/early/2017/01/18/gr.214270.116 Note: This was the original repository which will no longer be officially maintained. Please use the new official repository here:
https://github.com/lbcb-sci/racon
MIT License
261 stars 48 forks source link

racon for indels and negative gaps on scaffolds #165

Closed dcopetti closed 4 years ago

dcopetti commented 4 years ago

Hi, My assembly is in large scaffolds (N50 10s of Mb) that I built aligning contigs (ONT reads, assembly with Flye, polishing with Flye and Pilon) to an optical map. Since in some cases two flanking contigs end on the same repeat (overlapping each other, but not being joined by Flye), I have a few negative gaps (13 Ns between ctgs) that I would like to smooth out. Also, there are other tandem duplications (1-20 kb?), SVs, and gaps that I would like to close. I wonder if Racon will be the right tool for this. I see that using -u should help (also considering that the gaps between contigs will have 0 coverage), but I want to make sure that, more than at the single-nt level, racon will correct small tandem duplications and negative gaps. I have up to 45x coverage of ONT reads (N50 18 kb, median QV 9.4), or I can select subsets at 10 (31x, N50 24 kb) or 20 (16x, N50 45 kb) kb, to avoid a too long alignment step.

Do you think racon will work well here? Thanks, Dario

rvaser commented 4 years ago

Hi Dario, I am not quite sure if it will work as described here, especially for the longer introduced duplications. Are you certain that the gaps consisting of Ns will have zero coverage? If you will have a go at it, I would certainly try with the whole data set.

Best regards, Robert

dcopetti commented 4 years ago

sure, I am testing it on two scaffolds. Does it scale well with e.g. 48 CPUs? Or shall I split reference and alignment file and run a few jobs in parallel with less CPUs? Would that create an intense I/O?

rvaser commented 4 years ago

You do not have to split anything, just run everything together.

dcopetti commented 4 years ago

Cool. I run it, I saw that it fixed most of the small (+ an -) gaps up to 40-50 kb. I did not check at the quality level at called bases, but I assume it is good. I will use racon genome-wide later on. Thanks!