Open cjain7 opened 1 year ago
When calculating the fraction of dipcall SVs overlapping minigraph SVs, I was only considering SVs in confident regions. When I did the reverse, I didn't apply the confident region. For comparing SVs, usually we don't apply a hard cutoff on SV length. For example, if we want to evaluate SVs >=50bp, we typically include "truth SVs" >=30bp instead because SV representations are often different. You will get higher percentages with these changes.
You may still see lower percentages in comparison to the numbers in the paper for two reasons. 1) The paper considers SVs >=100bp. The consistency is usually lower for >=50bp SVs. 2) With base alignment, minigraph now produces smaller bubbles. The differences in SV representation will be amplified. For example, if we produce one bubble per chromosome, we will get a 100% consistency. In theory, it would be better to take SV lengths into account, but it is tricky in practice.
PS: I have looked at graphs generated with more recent versions. The graphs generally look tighter and more precise – overall an improvement.
Dear Heng,
I and @gsc74 tried to compare minigraph and Dipcall SV calls by using HG002 HiFi diploid assembly. We used a similar method for comparison as how you had used the Syndip benchmark in the minigraph paper. We didn't expect to reproduce the exact same numbers that were presented in the paper because significant improvements to minigraph code have been made since then. Unfortunately we are seeing worse performance with the latest minigraph code when compared to the paper. I am sharing the results and benchmarking methodology here. If you have any advice, please let us know.
Genomes used: Diploid HG002 assembly (HAP1=GCA_018852605.1, HAP2=GCA_018852615.1) REF= GCA_000001405.15_GRCh38_no_alt_analysis_set.fna Minigraph version: 0.20-r559
Total SVs called by minigraph: 24004 Fraction of minigraph SVs that overlap with Dipcall SV calls: 91.2% Total SVs called by Dipcall: 28479 Fraction of Dipcall SVs that overlap with minigraph SV calls: 85.5% But the numbers from paper were 94% and 97.3% respectively for calling SVs of length ≥ 100bp.
Minigraph commands:
Dipcall commands:
We checked how many minigraph SVs are supported by Dipcall SVs:
Similarly, we checked how many Dipcall SVs are supported by minigraph SVs:
Do you think the above results are okay?