marbl / merqury

k-mer based assembly evaluation
Other
280 stars 19 forks source link

How to understand the error rate #54

Closed zmz1988 closed 3 years ago

zmz1988 commented 3 years ago

Dear Arangrhie,

I used mercury to assess the QV and error rate in my assemblies, and run into a question about how much difference it is between error rate e.g. 9.12E-06 and 1.24E-06? Could we understand the error rate 9.12E-06 as 9.12 bp errors per million bp? From the equation written in the mercury paper, I guess I can't understand in this way, right? Sorry if this question is dumb.

The reason why I ask this question is that I have several assemblies, and it's difficult for me to choose the best one according to QV and N50. Please see below:

Assembly. QV Error rate. N50 N75 assembly1 54.92 3.2E-06 16Mbp 6Mbp assembly2 59.06 1.24E-06 10Mbp 4Mbp assembly3 50.39 9.12E-06 18Mbp 15Mbp

I would like to know how difference the QV means among these assemblies, in order to decide whether I should choose the one with the best N50 and N75 values or the one with the highest QV.

Could you please give me some hints?

Thanks a lot!

arangrhie commented 3 years ago

Dear @zmz1988 ,

I'd recommend to compare NG50 instead of N50 for comparing continuity, assuming they are purged for un-wanted haplotigs.

The QV seems very good for assembly 2, so if the NG50s are comparable I'd prefer to use that. Note that QV can be improved with additional polishing so wouldn't worry too much if the more continuous assembly turned out the one with lowest QV.

Please refer to https://github.com/marbl/merqury/wiki/2.-Overall-k-mer-evaluation#1-reference-free-qv-estimation for more detailed description.

zmz1988 commented 3 years ago

Thanks a lot, @arangrhie! Then I know what to do.