Closed ekimb closed 4 years ago
Hi @ekimb
Here is a reference on how to use paftools for benchmarking https://github.com/lh3/minimap2/tree/master/misc
It looks like you are missing the pbsim2fq
step. It makes sure that simulated read ids are compatible with paftools
.
Also copy pasting the commands I had used
REF=chrX.fasta
pbsim --depth 5 --data-type CLR --prefix depth5 --accuracy-mean 0.90 --length-min 1000 --length-mean 15000 --model_qc PBSIM-PacBio-Simulator/data/model_qc_clr $REF
cat depth5*.maf > depth5_combined.maf
PAFTOOLSEXE=paftools.js
REF_FAI=chrX.fasta.fai
$PAFTOOLSEXE pbsim2fq $REF_FAI depth5_combined.maf > depth5_combined.annotated.fasta
Winnowmap software version corresponding to our results in the preprint is v1.0 (available in releases). Although that shouldn't make a significant difference.
For the final step where you use mapeval
command, you would need to provide -m 2
parameter to make sure you are evaluating all primary alignments and no secondary alignments.
$PAFTOOLSEXE mapeval -m 2 output.paf > report.txt
Forgot to mention that I also used pbsim2fq
here, but the results are the same.
I didn't specify depth when using pbsim
though, will check and update. Thank you!
This worked! Closing this issue.
Thank you!
Great! What did mapeval
return after it worked?
The error rates changed to around 0.00 for quality thresholds with an accumulative error rate of 0.0010.
The 0.06% error rate reported in the paper seems to be different here.
We made a slight change in the latest version, in the sense that it does a very mild masking of k-mers rather than no masking. This may be causing this drop in accuracy. You could probably reproduce 0.06
or close to that by providing an additional parameter -f 0
that would turn off masking in the latest winnowmap version.
Thanks for doing this exercise. I'll re-think the latest change to the code, and might revert it.
Hello,
I'm running
pbsim
with the following command to replicate Winnowmap results for D1 in the paper:pbsim --data-type CLR --model_qc model_qc_clr --length-mean 15000 --accuracy-mean 0.9 chrX.fasta
Then running
winnowmap
with:./winnowmap -W bad_Hk19_mers.txt -cx map-pb ~/chrX.fasta ~/simPB.fq > wOutput.paf
Then, running
paftools mapeval wOutput.paf
returnsQ 60 200926 200926 1.000000000 200926
Q 59 56 56 1.000000000 200982
for all quality thresholds.
Is this expected? Can you provide more comprehensive instructions to replicate the results in the paper?
Thanks!