pangenome / pggb

the pangenome graph builder
https://doi.org/10.1101/2023.04.05.535718
MIT License
355 stars 38 forks source link

How to understand the vcf file output by PGGB? #334

Open Lucio-Yang opened 11 months ago

Lucio-Yang commented 11 months ago

Hi, I ran PGGB to build a pan-genome map of wheat and got a VCF file. I want to know what “.” in the VCF represent. If “0” represents consistency with the reference, then what does “.” represent? Is this site not aligned after multiple sequence alignment?

Thanks ! image

AndreaGuarracino commented 11 months ago

The "." means "missing genotype". For some reason, vg was not able to genotype the site. Usually, it is due to a missing alignment, but I suppose complex graph structures could put vg in difficulty as well.

Lucio-Yang commented 11 months ago

The "." means "missing genotype". For some reason, vg was not able to genotype the site. Usually, it is due to a missing alignment, but I suppose complex graph structures could put vg in difficulty as well.

Thank you for your quick reply ! I still have a question: can the tanslocation or inversion be reflected in the graph or VCF obtained by PGGB, and how to count it?

AndreaGuarracino commented 11 months ago

You will see both of them in the graph. In the VCF, you can see the inversions after VCF decomposition with vcfwave. You can trigger the decomposition specifying the max allele length with the -V option (something like -V reference:10000).