vibansal / HapCUT2

software tools for haplotype assembly from sequence data
BSD 2-Clause "Simplified" License
208 stars 36 forks source link

Merge vcf when combine HiC+HiFi? #129

Closed miaoxm2 closed 1 year ago

miaoxm2 commented 1 year ago

Hi,

Thanks for developing the nice tools!

I learned from the manual while combining HiC and Long reads, we should concatenate the fragment file from each which was generated independently. But if I understood correctly, we will also need different vcf for generating fragment using different dataset right? At the last step of HapCut2, I saw all the pipelines only mentioned using vcf and a cat fragment as input.

Is the final vcf also merged from different technology? that is Hic_vcf + HiFi_vcf? I am unsure if there's any incompatibility between them, while they are sometimes produced by different calling programs. And I don't understand either how should we choose one of the vcf as the final benchmark. Please correct me if I am wrong.

Thanks! Best, Xiaomeng

vibansal commented 1 year ago

A single VCF file should be used for running HapCUT2 (all steps). The VCF file has the variants from the genome that was sequenced. The VCF file that has variants with higher accuracy should be used.