twolinin / longphase

GNU General Public License v3.0
98 stars 6 forks source link

Different benchmark datasets showed great differences in LongPhase phasing SNP only. #56

Open Jerry-is-a-mouse opened 3 months ago

Jerry-is-a-mouse commented 3 months ago

@twolinin Hello, when I try to evaluate different phasing tools like what you had done in LongPhase' paper, the results showed great differences between the 2 benchmark vcf datasets( GIAB _HG002_GRCh38_1_22_v4.2.1benchmark.vcf and hifiasm transfer under supplymentFiles _HG002_GRCh38_1_22_v4.2.1_benchmark_hifiasm_v11phasetransfer.vcf). The second one I think maybe a more standard benchmark? The results of benchmarking using whatshap compare are upload now. The switch error and Hanmin distance are too high. And I don't know what canse this. I use minimap2 to align and clair3 to call snps. Looking forword to your reply. longphase.pacbio.benchmark1.result.txt longphase.pacbio.benchmark2.result.txt

twolinin commented 3 months ago

Dear @Jerry-is-a-mouse,

I tend to use the variant and phase set more complete HG002_GRCh38_1_22_v4.2.1_benchmark_hifiasm_v11_phasetransfer.vcf. You may find that the benchmark differences between these two versions are quite significant. Based on early observations, many errors observed in HG002_GRCh38_1_22_v4.2.1_benchmark.vcf are corrected in HG002_GRCh38_1_22_v4.2.1_benchmark_hifiasm_v11_phasetransfer.vcf.

Jerry-is-a-mouse commented 3 months ago

Thank you very much, I think so too.