Pysam is a Python package for reading, manipulating, and writing genomics data such as SAM/BAM/CRAM and VCF/BCF files. It's a lightweight wrapper of the HTSlib API, the same one that powers samtools, bcftools, and tabix.
Hi, I am using the latest pysam with version 0.21.0, and encountered a bug in the following story:
I got the Vcf output from Vardict(1.8.3), and the contigs information are missed.
I wrote a simple python script to add contig information into the vcf:
# update header
for line in contig_info:
if type(line) == list:
header.contigs.add(line[0], length=line[1],)
else:
header.add_line(line)
# write out new vcf
vcf_out = pysam.VariantFile(out, "w", header=header)
vcf_out.write(record)
for r in vcf:
vcf_out.write(r)
vcf_out.close()
Bctools reported errors such as b"Reference allele mismatch at 1:25469502 .. REF_SEQ:'G' vs VCF:'C'\n"
Thus I checked the vcf and found that some contig names were wrongly assigned by pysam.VariantFile, such as "2 -> 1" or "10 ->12".
Howerver, I never met this problem using the same code days ago.
I realized that this is a special case, that is my vcf is from a small panel which only involves a few contigs.
Potential Bug here: My vcf did not involve all contigs (such as contig "6" is not contained) and when I added all the human contigs into the vcf, pysam may wrongly update the contig names of records and output the wrong result.
Hi, I am using the latest pysam with version 0.21.0, and encountered a bug in the following story:
Best regards!