Closed SZ-qing closed 5 months ago
Uh, that's very odd indeed. Can you share a test case to reproduce the problem, please?
Uh, that's very odd indeed. Can you share a test case to reproduce the problem, please?
VCF file was from dbSNP database (version is b155):
wget https://ftp.ncbi.nih.gov/snp/archive/b155/VCF/GCF_000001405.39.gz
The chromosome ID inside this file is in Refseq
format, and I need to convert it to regular format such as 1, 2, 3
, etc., so I downloaded the corresponding conversion data from NCBI, and used bcftool annotate
to do the ID conversion, and at this time, all the POS information is normal:
bcftools annotate --rename-chrs processed_id_data.txt GCF_000001405.39.gz -o GCF_000001405.39.renamed.vcf
processed_id_data.txt:
GCF_000001405.39.renamed.vcf:
The next step is to sort using bcftool sort:
bcftools sort --temp-dir ./tmp/ GCF_000001405.39.renamed.vcf -o GCF_000001405.39.renamed.sorted.vcf
GCF_000001405.39.renamed.sorted.vcf:
At this point the POS information has been encoded from 1.
Uh, that's very odd indeed. Can you share a test case to reproduce the problem, please?
I'm very sorry, I found that there are multiple pos
info corresponding to the same RS id
in these dbSNP data,
so bcftool is fine, thanks!
I have a vcf file where the chromosome numbers are not in order, so I want to sort the chromosomes in order to build an index using tabix. but there is a problem where the
POS
columns start counting from1
after using thesort
function, which leads to a disorder in the RSPOS information. Command line:bcftools sort --temp-dir ./tmp/ GCF_000001405.39.renamed.vcf -o GCF_000001405.39.renamed.sorted.vcf
Before sort data:
After sort data:![image](https://github.com/samtools/bcftools/assets/85827219/5c26ac51-ac54-4293-b3ed-fb83c7ff8505)
My bcftools version is : 1.18-15-g21755519 (using htslib 1.18)