isovic / racon

Ultrafast consensus module for raw de novo genome assembly of long uncorrected reads. http://genome.cshlp.org/content/early/2017/01/18/gr.214270.116 Note: This was the original repository which will no longer be officially maintained. Please use the new official repository here:
https://github.com/lbcb-sci/racon
MIT License
272 stars 49 forks source link

BUSCO score reduces after racon polish step #158

Open emmannaemeka opened 4 years ago

emmannaemeka commented 4 years ago

Hello I noticed something strange after running Pilon twice on the sequence, the BUSCO score reduced from 93.6%(Obtained after Pilon 2 runs) to 76.4% when I ran Racon on the second Pilon polish.

What's the ideal polishing protocol using Illumina. Should One polish first with long reads(Racon) then short reads(racon) and then finally with Pilon?

Thanks

rvaser commented 4 years ago

Hello, can you please paste the Racon command (+bwa/minimap2) you used after two iterations of Pilon?

Best regards, Robert

emmannaemeka commented 4 years ago

minimap2 -ax map-ont -t 28 ~/_pilon_2x.fa ~/long_read.fq > ~/racon1x.sam

/racon -m 8 -x -6 -g -8 -w 500 -t 30 ~/long_read.fq ~/racon1x.sam ~/_pilon_2x.fa > flye_22_09_19.fa

emmannaemeka commented 4 years ago

Its surprising why it happens something similar was reported Because both polishing techniques alone failed to achieve BUSCO scores equal to or better than the published reference genomes, we then polished using a combination of both Racon and Pilon. We first attempted to run Pilon and Racon in combination, one after the other (e.g., Racon, Pilon, Racon, Pilon, etc.), but found that while BUSCO scores improved with each iteration of Pilon, they then fell with each iteration of Racon

Miller, D. E., Staber, C., Zeitlinger, J., & Hawley, R. S. (2018). Highly Contiguous Genome Assemblies of 15 Drosophila Species Generated Using Nanopore Sequencing. G3 (Bethesda, Md.), 8(10), 3131–3141. https://doi.org/10.1534/g3.118.200160

rvaser commented 4 years ago

Well the reason is that you are using long reads in Racon rounds and short reads in Pilon rounds. If you use erroneous long reads after accurate short reads, you will get lower accuracy and thus lower BUSCO scores. You need to first use Racon to polish the assembly with long reads, and afterwards you can use any combination of Racon and Pilon with short reads to further increase the accuracy.

dtusso2020 commented 3 years ago

Hello!!!!

I have the same problem, but in my case, I am using only long reads to polish with racon. I used to assembly canu and Wtdbg2, in both of them happens the same.

rvaser commented 3 years ago

Hi @jforero2020, how much does the BUSCO score decrease? Which sequencing technology reads do you have? Which mapper did you use?

Best regards, Robert

dtusso2020 commented 3 years ago

Hi @rvaser

It decreases from 98% to 69,7 %, the technology is PacBio RSII and I am using minimap2 for mapping.

rvaser commented 3 years ago

Were the assemblies polished with anything else in between?

giriarteS commented 2 years ago

I have a similar situation. I tested several assemblers (canu, flye, smartdenovo and necat) with my nanopore reads. Then I polished the assemblies with 10 rounds of minimap2/racon with and without trimming. With trimming I lost telomeres and the busco scores improved substantially cp $Assembly current-assembly.fa for i in $(seq 1 $Iterations); do echo "Iteration - $i" minimap2 -x map-ont -t 24 current-assembly.fa $Reads > racon_round_$i.reads_mapped.paf racon -t 24 $Reads racon_round_$i.reads_mapped.paf current-assembly.fa > $WorkDir/racon_round_$i.fasta cp racon_round_$i.fasta current-assembly.fa cp racon_round_$i.fasta $CurDir/$OutDir/"$Prefix"_racon_round_$i.fasta done

Without trimming I kept telomeres and the busco scores decreased cp $Assembly current-assembly.fa for i in $(seq 1 $Iterations); do echo "Iteration - $i" minimap2 -x map-ont -t 24 -c current-assembly.fa $Reads > racon_round_$i.reads_mapped.paf racon -t 24 --no-trimming $Reads racon_round_$i.reads_mapped.paf current-assembly.fa > $WorkDir/racon_round_$i.fasta cp racon_round_$i.fasta current-assembly.fa cp racon_round_$i.fasta $CurDir/$OutDir/"$Prefix"_racon_round_$i.fasta done

But after polishing with the racon polished reads with and without trimming with medaka and pilon, the busco scores are better than the reference genome.