samtools / bcftools

This is the official development repository for BCFtools. See installation instructions and other documentation here http://samtools.github.io/bcftools/howtos/install.html
http://samtools.github.io/bcftools/
Other
677 stars 240 forks source link

Error while running bcftools view #615

Open elowy01 opened 7 years ago

elowy01 commented 7 years ago

Hi,

I'am trying to run the following: bcftools view -r HLA-C*06:46N input.vcf.gz

And it did not work and I got the following: [synced_bcf_reader.c:949 _regions_init_string] Could not parse the region(s): HLA-C06:46N Failed to read the regions: HLA-C06:46N

Apparently the problem is in the colon that this contig has in its name. Is there a way of fixing this error?

Thanks,

ernesto

dmckean commented 6 years ago

So the vcf has a CHROM named HLA-C*06:46N? If so, bcftools tools won't be able to parse this. CHROM names cannot have a colon or whitespace, see the VCF 4.1 spec (or any version), section 1.4: Data lines:

The colon symbol (:) must be absent from all chromosome names to avoid parsing errors when dealing with breakends. (String, no white-space permitted, Required).

If altering the VCF is an option, removing or swapping out the colon for another character would let bcftools view the VCF. Here's changing colons to underscores:

cat input.vcf.gz | bgzip -d | awk '{if($1 !~ /^#/){gsub(":","_",$1)}; print $0}' | bgzip > fixed.vcf.gz

I would suggest fixing any contig header lines like ##contig=<ID=HLA-C*06:46N...> as well.