isovic / racon

Ultrafast consensus module for raw de novo genome assembly of long uncorrected reads. http://genome.cshlp.org/content/early/2017/01/18/gr.214270.116 Note: This was the original repository which will no longer be officially maintained. Please use the new official repository here:
https://github.com/lbcb-sci/racon
MIT License
269 stars 49 forks source link

Indels around the limit of the window #240

Closed RolandFaure closed 3 months ago

RolandFaure commented 5 months ago

Hello,

Thanks for developping Racon, I use it all the time :smiley:

I am polishing amplicon sequences (they are short, usually less than 1000bp), and I noticed that racon sometimes introduced small indels at position 500 of the consensus, which I suppose is linked with the fact that 500 is the size of the default window. Do you know where this issue might come from ?

Attached is one very small example: racon_pb.gz

I have v1.4.20 and here is the command lines used:

minimap2 -ax map-ont consensus_0.fa reads_0.fasta > mapped_0.sam
racon reads_0.fasta mapped_0.sam consensus_0.fa > polished_0.fasta

Around position 500 of polished_0.fasta I obtain the sequence "TGTGCAGATTTTTGACAA", which is in none of the reads and should instead be "TGTGTGCGATTTTTGACAA".

Thanks in advance

isovic commented 3 months ago

Hi, You are correct. There is unfortunately a side effect of windowing, if the window boundary happens to fall on an indel region.

There are a couple of options you may try:

  1. Run Racon twice, either with the same (default) window size, or with a slightly different window size. If your input data has a bias towards insertions or deletions, then your consensus sequence will change in length, and the window boundaries in the second round should be different than in the first round and you can just run it with the same default window size. If your consensus does not vary much in length, try using a slightly larger/smaller window size for the second round.
  2. Since your target sequences are only ~1000bp in length, you can try to bump up the window size to 1000bp or more, and produce a consensus as a single window. That way you will avoid windowing issues altogether.

Hope this helps, Best regards, Ivan.