pysam-developers / pysam

Pysam is a Python package for reading, manipulating, and writing genomics data such as SAM/BAM/CRAM and VCF/BCF files. It's a lightweight wrapper of the HTSlib API, the same one that powers samtools, bcftools, and tabix.
https://pysam.readthedocs.io/en/latest/
MIT License
785 stars 273 forks source link

tabix_index always makes CSI indexes for VCFs #641

Open multimeric opened 6 years ago

multimeric commented 6 years ago

Basically, pysam.tabix_index always tries to make a CSI index for VCF files, no matter what you do. This is different behaviour from tabix, and also probably a bug.

$ pip freeze | grep pysam
pysam==0.14

$ python -c 'from pysam import tabix_index; tabix_index("/app/out/TCRBOA3.combined.vcf.gz", preset="vcf")'
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "pysam/libctabix.pyx", line 1011, in pysam.libctabix.tabix_index
IOError: filename '/app/out/TCRBOA3.combined.vcf.gz.csi' already exists, use *force* to overwrite

$ python -c 'from pysam import tabix_index; tabix_index("/app/out/TCRBOA3.combined.vcf.gz", preset="vcf", csi=False)'
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "pysam/libctabix.pyx", line 1011, in pysam.libctabix.tabix_index
IOError: filename '/app/out/TCRBOA3.combined.vcf.gz.csi' already exists, use *force* to overwrite
AndreasHeger commented 6 years ago

Hmm, strange, I can't reproduce this, see session below:

(pysam-devel) andreas@cgath2[0]: /ifs/devel/andreas/pysam/tests/tt >
  ls
test.vcf.gz
(pysam-devel) andreas@cgath2[0]: /ifs/devel/andreas/pysam/tests/tt >
  python -c 'from pysam import tabix_index; tabix_index("test.vcf.gz", preset="vcf")'
(pysam-devel) andreas@cgath2[0]: /ifs/devel/andreas/pysam/tests/tt >
  ls
test.vcf.gz  test.vcf.gz.tbi
(pysam-devel) andreas@cgath2[0]: /ifs/devel/andreas/pysam/tests/tt >
  python -c 'from pysam import tabix_index; tabix_index("test.vcf.gz", preset="vcf", csi=True)'                                                                                               
(pysam-devel) andreas@cgath2[0]: /ifs/devel/andreas/pysam/tests/tt >
  ls
test.vcf.gz  test.vcf.gz.csi  test.vcf.gz.tbi
(pysam-devel) andreas@cgath2[0]: /ifs/devel/andreas/pysam/tests/tt >

Could you double-check?

multimeric commented 6 years ago

I think I realised the issue was that it was actually a bcf file, but with a filename ending in .vcf. So I guess it actually did the correct behaviour