odelaneau / shapeit4

Segmented HAPlotype Estimation and Imputation Tool
MIT License
90 stars 18 forks source link

Keep old INFO and FORMAT field #58

Closed nickhir closed 3 years ago

nickhir commented 3 years ago

Hello,

I am using shapeit v4.2 to phase my germline mutation calls which I got using GATKs HaplotypeCaller (WGS data). As a reference, I am using the 1000k genome project.

I think that everything works well, since the program runs without an error and my samples are phased afterwards, but the issue is that the resulting phased vcf file loses almost all INFO FORMAT and QUAL entries. E.g. for the format, only GT remains.

So my question is, if it is possible to simply "add" the phased Genotypes and keep all other entries of my VCF, i.e. keep all previous vcf entries.

Furthermore, I read, that shapeit v4 only retains variants that are shared between the input and the reference . Is there an option to keep all variants, and simply ignore the ones that are missing in the reference while phasing?

Any help is much appreciated!

odelaneau commented 3 years ago

I understand your problem, but adding these functionalities is actually problematic. This would substantially increase the RAM usage and makes the code more complex to maintain on my side. All points you mentioned can actually be done quite easily using data management programs such as bcftools.

Best,

Olivier.