HG002 results suspiciously bad

jdidion commented 1 month ago

I re-aligned the HG002 30x PCR-free WGS BAM from Baid et al against hs37d5 using DRAGEN.

I then used the BAM with WHAM to call SVs:

whamg -a hs37d5.fa -f HG002.bam -x 8 \
  | vcf-sort -c \
  | uniq \
  | bcftools norm -N -m-any -O z --write-index=tbi -o HG002.wham.vcf.gz

I then benchmarked against the GIAB v0.6 high-confidence callset using Witty.er:

docker run --rm -v $(pwd):/data -w /data wittyer \
  -i HG002.wham.vcf.gz \
  -t HG002_SVs_Tier1_v0.6.vcf.gz \
  -b HG002_SVs_Tier1_v0.6.bed \
  -o HG002.wham \
  --em SimpleCounting \
  --if PASS

The F1 score at the event level is 0.01 and at the base level is 0.14. I suspect I'm doing something wrong, but I can't figure out what. I've used the same process for benchmarking other SV callers and it works fine. Given the author of Wittyer, is it somehow biased against WHAM :)? Is there a different comparison tool and/or callset I should be using for evaluation.

zeeev commented 1 month ago

Hi @jdidion ,

It's been a while since I've looked at the performance. Mind sharing the summary recall/precisions stats with me? The tool tends to be overly sensitive, so it's possible precision is the problem. I doubt there's a bias in the benchmarking tool, but I can suggest some simple filters if the precision is driving the terrible F1.

jdidion commented 1 month ago

Attached, thanks! Recall looks to be the major issue. I'm going to try again with wham instead of whamg.

Wittyer.Stats.json

zeeev commented 1 month ago

https://www.nature.com/articles/s41439-024-00276-x/figures/4

I found this somewhat recent benchmarking paper:

It looks like the recall is around 70% for deletions in non-repeat regions.

jdidion commented 1 month ago

When I run wham I get:

Lots of warnings in the log file like When maskLen < 15, the function ssw_align doesn't return 2nd best alignment information.
A VCF file with all reference alleles set to N

Is this expected?

zeeev / wham

HG002 results suspiciously bad #64