samtools / bcftools

This is the official development repository for BCFtools. See installation instructions and other documentation here http://samtools.github.io/bcftools/howtos/install.html
http://samtools.github.io/bcftools/
Other
640 stars 241 forks source link

`bcftools concat --write-index` does not produce an index #2099

Closed blaiseli closed 5 months ago

blaiseli commented 5 months ago

As commented in https://github.com/samtools/bcftools/issues/1952, there does not seem to be an index generated for the .vcf.gz file I produce using bcftools concat:

$ bcftools --version
bcftools 1.18
Using htslib 1.18
Copyright (C) 2023 Genome Research Ltd.
License Expat: The MIT/Expat license
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
$ bcftools concat --naive -o tmp.vcf.gz -Oz --write-index $(for i in $(seq 1 22); do echo FB_SKV38_Basile_${i}.vcf.gz; done | tr "\n" " ")
Checking the headers of 22 files.
Done, the headers are compatible.
Concatenating FB_SKV38_Basile_1.vcf.gz  0.001039 seconds
Concatenating FB_SKV38_Basile_2.vcf.gz  0.000526 seconds
Concatenating FB_SKV38_Basile_3.vcf.gz  0.000871 seconds
Concatenating FB_SKV38_Basile_4.vcf.gz  0.000486 seconds
Concatenating FB_SKV38_Basile_5.vcf.gz  0.001307 seconds
Concatenating FB_SKV38_Basile_6.vcf.gz  0.000865 seconds
Concatenating FB_SKV38_Basile_7.vcf.gz  0.001875 seconds
Concatenating FB_SKV38_Basile_8.vcf.gz  0.000548 seconds
Concatenating FB_SKV38_Basile_9.vcf.gz  0.000789 seconds
Concatenating FB_SKV38_Basile_10.vcf.gz 0.000553 seconds
Concatenating FB_SKV38_Basile_11.vcf.gz 0.000746 seconds
Concatenating FB_SKV38_Basile_12.vcf.gz 0.000839 seconds
Concatenating FB_SKV38_Basile_13.vcf.gz 0.001176 seconds
Concatenating FB_SKV38_Basile_14.vcf.gz 0.000911 seconds
Concatenating FB_SKV38_Basile_15.vcf.gz 0.000828 seconds
Concatenating FB_SKV38_Basile_16.vcf.gz 0.000697 seconds
Concatenating FB_SKV38_Basile_17.vcf.gz 0.000931 seconds
Concatenating FB_SKV38_Basile_18.vcf.gz 0.000540 seconds
Concatenating FB_SKV38_Basile_19.vcf.gz 0.001025 seconds
Concatenating FB_SKV38_Basile_20.vcf.gz 0.000760 seconds
Concatenating FB_SKV38_Basile_21.vcf.gz 0.000734 seconds
Concatenating FB_SKV38_Basile_22.vcf.gz 0.000571 seconds
$ ls tmp*
tmp.vcf.gz

Not even a hidden file:

$ ls -a
.                          FB_SKV38_Basile_15.vcf.gz  FB_SKV38_Basile_21.vcf.gz  FB_SKV38_Basile_7.vcf.gz
..                         FB_SKV38_Basile_16.vcf.gz  FB_SKV38_Basile_22.vcf.gz  FB_SKV38_Basile_8.vcf.gz
FB_SKV38_Basile_10.vcf.gz  FB_SKV38_Basile_17.vcf.gz  FB_SKV38_Basile_2.vcf.gz   FB_SKV38_Basile_9.vcf.gz
FB_SKV38_Basile_11.vcf.gz  FB_SKV38_Basile_18.vcf.gz  FB_SKV38_Basile_3.vcf.gz   tmp.vcf.gz
FB_SKV38_Basile_12.vcf.gz  FB_SKV38_Basile_19.vcf.gz  FB_SKV38_Basile_4.vcf.gz
FB_SKV38_Basile_13.vcf.gz  FB_SKV38_Basile_1.vcf.gz   FB_SKV38_Basile_5.vcf.gz
FB_SKV38_Basile_14.vcf.gz  FB_SKV38_Basile_20.vcf.gz  FB_SKV38_Basile_6.vcf.gz
pd3 commented 5 months ago

This is because naive concatenation cannot produce an index, it just streams the compressed data without uncompressing them. In the latest github version f33fd1d14176c009d50c640a36b16a13632e9e91 the program will throw an error when both options --naive and --write-index are given.

Thank you for reporting the issue.