lbcb-sci / racon

Ultrafast consensus module for raw de novo genome assembly of long uncorrected reads
MIT License
201 stars 34 forks source link

Decrease in accuracy from racon-v1.4.5 to racon-v1.4.11 #29

Closed wshropshire closed 3 years ago

wshropshire commented 4 years ago

Hello,

I am doing long-read + short-read 'hybrid' assemblies where I do a flye assembly followed by two rounds of racon long-read polishing, medaka, and then two rounds of short-read polishing, a somewhat adapted protocol for what Ryan Wick uses in his recent long-read polishing paper he released last year.

What I have found is there seems to be a large difference in aberrant insertions going from version 1.4.5 to 1.4.11. We are using snippy to check for SNPs and INDELs and I have found that has an increase in obvious insertion errors which can be seen when looking at the long or short pileup with tablet/IGV. I re-built racon from the source code release and got the same results. Additionally, I found with an older version of racon, v1.3.2, that the INDEL/SNPs decreased and matched my results with v1.4.5. Here is an example of results following two short-read polishes of a Kpn genome:

racon-v1.4.11 Screen Shot 2020-03-10 at 5 32 04 PM

racon-v1.4.5 Screen Shot 2020-03-10 at 5 32 49 PM

the racon-v1.3.2 results mirrored the v1.4.5 results.

I haven't systematically checked this across multiple isolates, but I did notice this with another particular genome today which is what motivated me to look at the differences in results across these different versions of racon. For the time being we are just going to drop down to v1.4.5 in our pipeline, however, thought this may be of interest to y'all.

Best,

Will

rvaser commented 4 years ago

Hello Will, the polishing inconsistency is most probably due to different alignment parameters used in POA. From version v1.4.6 the match, mismatch and gap scores were changed to 3, -5 and -4, respectively. The old values are 5, -4 and -8. You can use the old ones in the newest racon version with the following arguments:

    -m, --match <int>
        default: 3
        score for matching bases
    -x, --mismatch <int>
        default: -5
        score for mismatching bases
    -g, --gap <int>
        default: -4
        gap penalty (must be negative)

Best regards, Robert

wshropshire commented 4 years ago

Okay, thanks for the clarification