Open xuxingyubio opened 1 month ago
HG002 was held out of the release HPRC graphs. If you want to make your own hg002-only graph you can do so quite quickly with minigraph-cactus
You'd feed it something like
HG002_hap1 HG002.hap1.fa.gz
HG002_hap2 HG002.hap2.fa.gz
And run with --reference HG002_hap1 HG002_hap2 --vcf --vcfreference HG002_hap1 HG002_hap2
among the usual options to get a pair of haploid vcf's comparing each haplotype with the other.
Otherwise if you already have a graph with HG002 in it, then I think deconstruct -P
will work. You may need to promote HG002 to a reference path as described here
Thank you for your response. I followed your method and tried it out, but I noticed that the contig lengths in the generated VCF file do not match the original lengths, resulting in positional misalignment. Could this be due to some trimming performed during pangenome construction(minigraph cactus)? Is it possible to obtain the trimmed fasta file used in pangenome construction?
id=NA12878hap1|ptg000002l 1895202
contig=
Yeah, that's a known issue due to path fragmentation. The VCF itself is valid and coordinates correct, it's just that the contig lengths can be too short in the header. This only happens when multiple references are given, and only to references after the first (so hap2 in our example). You options are:
--reference HG002_hap2 --vcf --vcfreference HG002_hap2
to make the second vcf--vcf full
to make a VCF of the unclipped graph. But note you will get some giant sites for the centromeres that you may want to remove yourself. I used hg38 as the reference, then switched the reference using vg convert and constructed the VCF file with vg deconstruct. However, I noticed that the reference bases in the VCF file at the corresponding coordinates do not match the original bases in the input FASTA file. Can using an unclipped graph solve this problem?
Hello VG Team, I am currently working with the HPRC pangenome and aiming to construct a VCF file that highlights the differences between the two haplotypes (hap1 and hap2) of the HG002 sample. Specifically, I want to generate a VCF file that represents hap2 relative to hap1 for HG002. So far, I have downloaded the HPRC pangenome data from the HPRC project, which includes multiple haplotypes for various samples, including HG002. I have attempted to use VG tools, such as vg convert to change the reference, but found that it doesn't seem to support operations targeting individual haplotypes, and vg deconstruct to obtain VCF files; however, it appears that it does not allow for processing single haplotypes separately. It seems that the current VG tools do not support operations on individual haplotypes within a sample. I am specifically looking to extract the variant differences between hap1 and hap2 of HG002 and represent them in a VCF file. Could you please guide me on how to effectively generate a VCF file that captures the differences between the two haplotypes (hap1 and hap2) of the HG002 sample from the HPRC pangenome? Thank you for your support!