Open elowy01 opened 7 years ago
So the vcf has a CHROM named HLA-C*06:46N
?
If so, bcftools tools won't be able to parse this. CHROM names cannot have a colon or whitespace, see the VCF 4.1 spec (or any version), section 1.4: Data lines:
The colon symbol (:) must be absent from all chromosome names to avoid parsing errors when dealing with breakends. (String, no white-space permitted, Required).
If altering the VCF is an option, removing or swapping out the colon for another character would let bcftools view the VCF. Here's changing colons to underscores:
cat input.vcf.gz | bgzip -d | awk '{if($1 !~ /^#/){gsub(":","_",$1)}; print $0}' | bgzip > fixed.vcf.gz
I would suggest fixing any contig header lines like ##contig=<ID=HLA-C*06:46N...>
as well.
Hi,
I'am trying to run the following: bcftools view -r HLA-C*06:46N input.vcf.gz
And it did not work and I got the following: [synced_bcf_reader.c:949 _regions_init_string] Could not parse the region(s): HLA-C06:46N Failed to read the regions: HLA-C06:46N
Apparently the problem is in the colon that this contig has in its name. Is there a way of fixing this error?
Thanks,
ernesto