Closed yfarjoun closed 6 years ago
This seems pretty similar to the ##META
and ##SAMPLE
lines defined by example in VCFv4.3 §1.4.8 Sample field format… maybe what's needed is some more detail in that section!
ah, right...while it's not in the preferred orientation (I'd prefer long lines than many lines...) it will make do.
Currently in order to provide information (phenotype, gender, pedigree, cohort, sequencing protocol, etc.) about the samples in your vcf you need to resort to an external file (PED or FAM for example, or roll-your-own.) This seems like an oversight that could be addressed in the VCF spec.
We could add header lines that include per-sample information, for example:
By putting the sample-level information into the VCF, this would enable tools that change the sample-list (merging vcfs or selecting samples) to modify the sample-level information at the same time, which would be safer than doing it in two separate steps (modify the vcf(s) and then modify the meta-data files accordingly)
I'm not married to the format I proposed above, but I wanted to give a definite proposal to start the discussion...
Any opinions?