jamescasbon / PyVCF

A Variant Call Format reader for Python.
http://pyvcf.readthedocs.org/en/latest/index.html
Other
398 stars 201 forks source link

vcf.Writer outputs mangled vcf header #68

Closed cooketho closed 11 years ago

cooketho commented 12 years ago

When I try out the following code the order of the lines in the original vcf file header is scrambled. The biggest problem with this is that the ##fileformat=VCFv4.1 line gets moved someplace else, and so GATK doesn't automatically recognize the file as a vcf. There may be a workaround by passing an option to GATK specifying the format, but really this is a problem with pyvcf, not GATK. Please update the next version so that the organization of the header is kept intact in the output vcf.

Python code: " import vcf reader=vcf.Reader(filename='test.vcf') writer=vcf.Writer(open('test.out.vcf', 'w'), reader) for record in reader: writer.write_record(record) "

input vcf: "

fileformat=VCFv4.1

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

ALT=

FORMAT=

FORMAT=

FORMAT=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

source_20120714.1=vcf-subset(r731) -c NA18505,NA18508,NA19648,NA19704, downloads/ALL.chr1.phase1_release_v3.20101123.snps_indels_svs.genotypes.vcf.gz

CHROM POS ID REF ALT QUAL FILTER INFO FORMAT NA18505 NA18508 NA19648 NA19704

1 10583 rs58108140 G A 100 PASS AA=.;AC=0;AF=0.14;AFR_AF=0.04;AMR_AF=0.17;AN=8;ASN_AF=0.13;AVGPOST=0.7707;ERATE=0.0161;EUR_AF=0.21;LDAF=0.2327;RSQ=0.4319;SNPSOURCE=LOWCOV;THETA=0.0046;VT=SNP GT:DS:GL 0|0:0.000:-0.07,-0.80,-5.00 0|0:0.000:-0.01,-1.68,-5.00 0|0:0.100:-0.03,-1.15,-5.00 0|0:0.150:-0.19,-0.45,-2.42 "

output vcf: "

ALT=

fileformat=VCFv4.1

source_20120714.1=vcf-subset(r731) -c NA18505,NA18508,NA19648,NA19704, downloads/ALL.chr1.phase1_release_v3.20101123.snps_indels_svs.genotypes.vcf.gz

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

FORMAT=

FORMAT=

FORMAT=

CHROM POS ID REF ALT QUAL FILTER INFO FORMAT NA18505 NA18508 NA19648 NA19704

1 10583 rs58108140 G A 100 . AA=.;AVGPOST=0.7707;AC=0;AF=0.14;ASN_AF=0.13;AFR_AF=0.04;AMR_AF=0.17;ERATE=0.0161;AN=8;LDAF=0.2327;VT=S;SNPSOURCE=LOWCOV;THETA=0.0046;RSQ=0.4319;EUR_AF=0.21 GT:DS:GL 0|0:0.000:-0.07,-0.8,-5.0 0|0:0.000:-0.01,-1.68,-5.0 0|0:0.100:-0.03,-1.15,-5.0 0|0:0.150:-0.19,-0.45,-2.42 "

cooketho commented 12 years ago

Never mind. Sorry! I see that this was an issue with version 0.4. I installed version 0.6 and it seems to be working fine. Thanks!