marbl / Winnowmap

Long read / genome alignment software
Other
250 stars 22 forks source link

paftools showing no accurate mappings after Winnowmap? #2

Closed ekimb closed 4 years ago

ekimb commented 4 years ago

Hello,

I'm running pbsim with the following command to replicate Winnowmap results for D1 in the paper:

pbsim --data-type CLR --model_qc model_qc_clr --length-mean 15000 --accuracy-mean 0.9 chrX.fasta

Then running winnowmap with:

./winnowmap -W bad_Hk19_mers.txt -cx map-pb ~/chrX.fasta ~/simPB.fq > wOutput.paf

Then, running paftools mapeval wOutput.paf returns

Q 60 200926 200926 1.000000000 200926 Q 59 56 56 1.000000000 200982

for all quality thresholds.

Is this expected? Can you provide more comprehensive instructions to replicate the results in the paper?

Thanks!

cjain7 commented 4 years ago

Hi @ekimb

Here is a reference on how to use paftools for benchmarking https://github.com/lh3/minimap2/tree/master/misc

It looks like you are missing the pbsim2fq step. It makes sure that simulated read ids are compatible with paftools.

Also copy pasting the commands I had used

REF=chrX.fasta
pbsim --depth 5 --data-type CLR  --prefix depth5 --accuracy-mean 0.90 --length-min 1000 --length-mean 15000 --model_qc PBSIM-PacBio-Simulator/data/model_qc_clr $REF
cat depth5*.maf > depth5_combined.maf
PAFTOOLSEXE=paftools.js
REF_FAI=chrX.fasta.fai
$PAFTOOLSEXE pbsim2fq $REF_FAI depth5_combined.maf > depth5_combined.annotated.fasta

Winnowmap software version corresponding to our results in the preprint is v1.0 (available in releases). Although that shouldn't make a significant difference.

For the final step where you use mapeval command, you would need to provide -m 2 parameter to make sure you are evaluating all primary alignments and no secondary alignments.

$PAFTOOLSEXE mapeval -m 2 output.paf > report.txt

ekimb commented 4 years ago

Forgot to mention that I also used pbsim2fq here, but the results are the same.

I didn't specify depth when using pbsim though, will check and update. Thank you!

ekimb commented 4 years ago

This worked! Closing this issue.

Thank you!

cjain7 commented 4 years ago

Great! What did mapeval return after it worked?

ekimb commented 4 years ago

The error rates changed to around 0.00 for quality thresholds with an accumulative error rate of 0.0010.

The 0.06% error rate reported in the paper seems to be different here.

cjain7 commented 4 years ago

We made a slight change in the latest version, in the sense that it does a very mild masking of k-mers rather than no masking. This may be causing this drop in accuracy. You could probably reproduce 0.06 or close to that by providing an additional parameter -f 0 that would turn off masking in the latest winnowmap version.

cjain7 commented 4 years ago

Thanks for doing this exercise. I'll re-think the latest change to the code, and might revert it.