samtools / bcftools

This is the official development repository for BCFtools. See installation instructions and other documentation here http://samtools.github.io/bcftools/howtos/install.html
http://samtools.github.io/bcftools/
Other
674 stars 240 forks source link

Invalid index is produced by --write-index and --threads #1985

Closed lacek closed 1 year ago

lacek commented 1 year ago

Versions:

Steps to reproduce:

wget -N ftp://ftp.ncbi.nlm.nih.gov/pub/clinvar/vcf_GRCh38/archive_2.0/2023/clinvar_20230819.vcf.gz
echo $(seq 1 22) X Y | awk -v RS=' ' '{print $1"\tchr"$1} END {print "MT\tchrM"}' > hg38_rename.txt
bcftools annotate --rename-chrs hg38_rename.txt --write-index --threads 1 -Oz -o clinvar_20230819.hg38.vcf.gz clinvar_20230819.vcf.gz
bcftools view -H clinvar_20230819.hg38.vcf.gz chrY | head

The following error is shown at this point:

[E::get_intv] Failed to parse TBX_VCF, was wrong -p [type] used?
The offending line was: "1627532;CLNDISDB=MedGen:CN517202;CLNDN=not_provided;CLNHGVS=NC_000001.11:g.931107C>T;CLNREVSTAT=criteria_provided,_single_submitter;CLNSIG=Likely_benign;CLNVC=single_nucleotide_variant;CLNVCSO=SO:0001483;GENEINFO=SAMD11:148398;MC=SO:0001627|intron_variant;ORIGIN=1"
Error: BCF read error

Recreate another one without --threads 1 and there's no error:

bcftools annotate --rename-chrs hg38_rename.txt --write-index -Oz -o clinvar_20230819.hg38.nothreads.vcf.gz clinvar_20230819.vcf.gz
bcftools view -H clinvar_20230819.hg38.nothreads.vcf.gz  chrY | head

If the index file is recreated by bcftools index, there's no error too:

bcftools index -f clinvar_20230819.hg38.vcf.gz
bcftools view -H clinvar_20230819.hg38.vcf.gz chrY | head

So there should be something wrong with the index file when it is produced by --write-index with --threads (>0).