Open parkesK opened 6 years ago
+1 on this request
Your faux vcf is wrong - you need to add, bare minimum, GT
to the FORMAT
column and a fake genotype (0/1
) to the fake sample 20
column. You can't just leave them blank.
Also, you need to add a FORMAT header field to define what GT
is (https://samtools.github.io/hts-specs/VCFv4.2.pdf).
##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality">
Here's a super barebones python script that will add a fake FORMAT
and sample
field to a vcf. You have to give a gzipped vcf as the first argument and a fake sample name (no spaces or special characters!) as the second argument to the script.
python3 this_script.py clinvar.vcf.gz FAKESAMPLENAME | bgzip > new_clinvar.vcf.gz
#!/usr/local/bin/python3
import sys
import gzip
vcf = sys.argv[1]
sample_name = sys.argv[2]
for line in gzip.open(vcf,'r'):
line = line.decode('utf-8')
if line[0:2] == '##':
print(line[:-1])
elif line[0:2] == '#C':
print('##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">')
print('##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality">')
print('##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Read Depth">')
print(line[:-1] + '\tFORMAT\t' + sample_name)
else:
line = line.split()
line = line[:-1]
print('\t'.join(line) + '\t.\tGT:GQ:DP\t0/1:100:100')
Hi there,
I annotated a VCF of Clinvar variants using vcfanno and would like to create a database using vcf2db. However, there are no samples in the VCF. I tried adding a fake sample in the header and then making a ped file to go along with this, but am getting an error with this:
File "cyvcf2/cyvcf2.pyx", line 932, in cyvcf2.cyvcf2.Variant.gt_bases.get (cyvcf2/cyvcf2.c:22059) ValueError: range() step argument must not be zero
I am guessing this is because there are no genotypes? Is there anyway to upload this vcf without samples? I have attached 2 VCFs that are each 1000 lines. The one that has _Copy in the title is the original; the other one has the fake sample added. I also attached my fake ped file.
Thanks, Parkes
clinvar_noncoding_pathogenic_headed_NoAnno_Split_VEP_annotated_HeaderFixed_Copy_1000.vcf.gz
fake.ped.gz
clinvar_noncoding_pathogenic_headed_NoAnno_Split_VEP_annotated_HeaderFixed_1000.vcf.gz