This pull request makes progress on #46—it adds the command line provenance to VCF output headers.
Testing
I added a unit test that checks the output for the header.
Example
vcztools view vcz_test_cache/sample.vcf.vcz
##fileformat=VCFv4.0
##FILTER=<ID=PASS,Description="All filters passed">
##fileDate=20090805
##source=myImputationProgramV3.1
##reference=1000GenomesPilot-NCBI36
##phasing=partial
##INFO=<ID=NS,Number=1,Type=Integer,Description="Number of Samples With Data">
##INFO=<ID=AN,Number=1,Type=Integer,Description="Total number of alleles in called genotypes">
##INFO=<ID=AC,Number=.,Type=Integer,Description="Allele count in genotypes, for each ALT allele, in the same order as listed">
##INFO=<ID=DP,Number=1,Type=Integer,Description="Total Depth">
##INFO=<ID=AF,Number=.,Type=Float,Description="Allele Frequency">
##INFO=<ID=AA,Number=1,Type=String,Description="Ancestral Allele">
##INFO=<ID=DB,Number=0,Type=Flag,Description="dbSNP membership, build 129">
##INFO=<ID=H2,Number=0,Type=Flag,Description="HapMap2 membership">
##FILTER=<ID=s50,Description="Less than 50% of samples have data">
##FILTER=<ID=q10,Description="Quality below 10">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality">
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Read Depth">
##FORMAT=<ID=HQ,Number=2,Type=Integer,Description="Haplotype Quality">
##ALT=<ID=DEL:ME:ALU,Description="Deletion of ALU element">
##ALT=<ID=CNV,Description="Copy number variable region">
##contig=<ID=19>
##contig=<ID=20>
##contig=<ID=X>
##vcztools_viewCommand=view vcz_test_cache/sample.vcf.vcz; Date=2024-08-31 16:14:10.986683
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT NA00001 NA00002 NA00003
19 111 . A C 9.6 . . GT:HQ 0|0:10,15 0|0:10,10 0/1:3,3
19 112 . A G 10 . . GT:HQ 0|0:10,10 0|0:10,10 0/1:3,3
20 14370 rs6054257 G A 29 PASS AF=0.5;DB;DP=14;H2;NS=3 GT:DP:GQ:HQ 0|0:1:48:51,51 1|0:8:48:51,51 1/1:5:43:.,.
20 17330 . T A 3 q10 AF=0.017;DP=11;NS=3 GT:DP:GQ:HQ 0|0:3:49:58,50 0|1:5:3:65,3 0/0:3:41:.,.
20 1110696 rs6040355 A G,T 67 PASS AA=T;AF=0.333,0.667;DB;DP=10;NS=2 GT:DP:GQ:HQ 1|2:6:21:23,27 2|1:0:2:18,2 2/2:4:35:.,.
20 1230237 . T . 47 PASS AA=T;DP=13;NS=3 GT:DP:GQ:HQ 0|0:.:54:56,60 0|0:4:48:51,51 0/0:2:61:.,.
20 1234567 microsat1 G GA,GAC 50 PASS AA=G;AC=3,1;AN=6;DP=9;NS=3 GT:DP:GQ 0/1:4:. 0/2:2:17 ./.:3:40
20 1235237 . T . . . . GT 0/0 0|0 ./.
X 10 rsTest AC A,ATG,C 10 PASS . GT 0 0/1 0|2
Discussion
I wasn't sure if the vcztools version is defined yet, so I have not added the vcztools version header yet.
I use the default date format in the vcztools header, which is different from the date format in the corresponding header in bcftools' output. Let me know if I should change the format to match bcftools.
Overview
This pull request makes progress on #46—it adds the command line provenance to VCF output headers.
Testing
I added a unit test that checks the output for the header.
Example
Discussion
I wasn't sure if the vcztools version is defined yet, so I have not added the vcztools version header yet.
I use the default date format in the vcztools header, which is different from the date format in the corresponding header in bcftools' output. Let me know if I should change the format to match bcftools.
References
--no-version
option.