human-pangenomics / hpp_pangenome_resources

95 stars 3 forks source link

Looking for inversions in Minigraph/CACTUS and PGGB decomposed vcfs #14

Open ratrrs opened 1 year ago

ratrrs commented 1 year ago

Hi,

I am looking for inversions in hprc-v1.0-mc.grch38.vcfbub.a100k.wave.vcf.gz and hprc-v1.0-pggb.grch38.vcfbub.a100k.wave.vcf.gz but am only finding the following annotations TYPE for structural variants: ins, del, complex. In addition, all INV annotations for variants in those files are followed by '0', so it's not clear where inversions are when using INV. Are there other annotations in INFO I should be looking for for inversions or a separate list all together for these samples? Thank you.

ekg commented 1 year ago

Hi,

I believe there is a field that should be in the vcfwave decomposed VCFs, INV=1, where the decomposition detects an inversion. Note that this will happen only for a limited size range of inversions, from 1kbp to 100kbp if I remember correctly.

If this doesn't work we can approach the problem using the input not WFA-realigned VCFs.

ratrrs commented 1 year ago

Hi, After parsing hprc-v1.0-mc.grch38.vcfbub.a100k.wave.vcf.gz and hprc-v1.0-pggb.grch38.vcfbub.a100k.wave.vcf.gz, I am only discovering "INV=0". There are ~5 million records with INV in the INFO column for those two files, but none are annotated with "INV=1".

ekg commented 1 year ago

I think you're seeing that this is something we need to work on. It should be correctly annotated, but it is possible that the version used had a bug at many points in the process that would infer the presence of an inversion.

It is also possible that the graph construction in both methods is resulting in problems putting inversions in the graph. Consider it a work in progress.