pysam-developers / pysam

Pysam is a Python package for reading, manipulating, and writing genomics data such as SAM/BAM/CRAM and VCF/BCF files. It's a lightweight wrapper of the HTSlib API, the same one that powers samtools, bcftools, and tabix.
https://pysam.readthedocs.io/en/latest/
MIT License
774 stars 274 forks source link

InError: Invalid BCF, the FORMAT tag id=91 at chr20:60137 not present in the header #1222

Closed karinkumar closed 12 months ago

karinkumar commented 1 year ago

Hello I am getting the error: Invalid BCF, the FORMAT tag id=91 at chr20:60137 not present in the header

Which does not make sense as I have no FORMAT items named 91. This is the code I am using and I've printed my header below. Could you please point me to what exactly this error is referring to? I tried to add a format line with id=91 but 91 was considered invalid id option.

vcf_reader = VariantFile(input_vcf, 'r')

vcf_out = VariantFile(output_vcf, 'w', header=vcf_reader.header)

# Define the 'PL' format field
vcf_reader.header.add_meta('FORMAT', items=[('ID',"PL"), ('Number',2), ('Type','String'),
    ('Description','Phred Scaled Likelihood')])

#vcf_reader.header.add_meta('FORMAT', items=[('ID',"91"), ('Number',2), ('Type','String'),
 #   ('Description','Error Fix')])
#print(vcf_reader.header)
for rec in vcf_reader:
    #rec.info['PL'] = []
    dp = rec.info['DP']
    for sample in rec.samples.values():
        #print(sample['PL'])
        sample['PL']  = simulate_DP_PL(sample['GT'], dp, coverage_frac, error)
        #print(sample['PL'])
    # Write the processed variant to the output VCF file
    vcf_out.write(rec)

vcf_reader.close()
vcf_out.close()

fileformat=VCFv4.1

FILTER=

contig=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

FORMAT=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

bcftools_viewVersion=1.17+htslib-1.17

bcftools_viewCommand=view -S ../samples.txt --force-samples -Oz -o ../validation/230813ASWvalidation.vcf.gz /net/1000g/hmkang/data/1KG.3202/20201028_3202_phased/CCDG_14151_B01_GRM_WGS_2020-08-05_chr20.filtered.shapeit2-duohmm-phased.vcf.gz; Date=Sun Aug 13 21:00:11 2023

FORMAT=

jmarshall commented 1 year ago

The root of the problem is in the record at chr20:60137, so you might need to show us the VCF records at that position too.

karinkumar commented 1 year ago

I ended up resolving this issue by writing to a new vcf rather than trying to modify the one I am reading in