liguowang / CrossMap

CrossMap is a python program to lift over genome coordinates from one genome version to another.
https://crossmap.readthedocs.io/en/latest/
Other
64 stars 23 forks source link

CrossMap VCF header entries do not work with Bcftools #40

Closed gungorbudak closed 2 years ago

gungorbudak commented 2 years ago

< and > in the value of the added header entries by the CrossMap causes Bcftools to produce error and ignore those header lines. Deleting these characters resolves the issue.

$ CrossMap.py -v
CrossMap 0.6.1
$ bcftools -v
bcftools 1.14
Using htslib 1.14
Copyright (C) 2021 Genome Research Ltd.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

Errors while outputting the header (see E::bcf_hdr_parse_lines):

$ bcftools view -h decode_variants_autosomes.EVA.fixed.B38.crossmap.vcf | tail      
[E::bcf_hdr_parse_line] Could not parse the header line: "##liftOverProgram=<CrossMap,version=0.6.1,website=https://sourceforge.net/projects/crossmap>"
[E::bcf_hdr_parse_line] Could not parse the header line: "##liftOverChainFile=</Users/gungor/References/hg18ToHg38.over.chain.gz>"
[E::bcf_hdr_parse_line] Could not parse the header line: "##originalFile=<decode_variants_autosomes.EVA.fixed.vcf.gz>"
[E::bcf_hdr_parse_line] Could not parse the header line: "##targetRefGenome=</Users/gungor/References/B38.p13.fa>"
[E::bcf_hdr_parse_line] Could not parse the header line: "##liftOverDate=<December08,2021>"
##contig=<ID=chrU_KI270755.1,length=36723,assembly=B38.p13.fa>
##contig=<ID=chrU_KI270756.1,length=79590,assembly=B38.p13.fa>
##contig=<ID=chrU_KI270757.1,length=71251,assembly=B38.p13.fa>
##contig=<ID=chrX,length=156040895,assembly=B38.p13.fa>
##contig=<ID=chrY,length=57227415,assembly=B38.p13.fa>
##contig=<ID=chrY_KI270740.1_random,length=37240,assembly=B38.p13.fa>
##contig=<ID=chrhs38d1,length=6030922,assembly=B38.p13.fa>
##bcftools_viewVersion=1.14+htslib-1.14
##bcftools_viewCommand=view -h decode_variants_autosomes.EVA.fixed.B38.crossmap.vcf; Date=Wed Dec  8 16:44:49 2021
#CHROM  POS ID  REF ALT QUAL    FILTER  INFO

After replacing the header with the entries without those characters:

header.txt

$ tail header.txt                                                                                  
##contig=<ID=chrX,length=156040895,assembly=B38.p13.fa>
##contig=<ID=chrY,length=57227415,assembly=B38.p13.fa>
##contig=<ID=chrY_KI270740.1_random,length=37240,assembly=B38.p13.fa>
##contig=<ID=chrhs38d1,length=6030922,assembly=B38.p13.fa>
##liftOverProgram=CrossMap,version=0.6.1,website=https://sourceforge.net/projects/crossmap
##liftOverChainFile=/Users/gungor/References/hg18ToHg38.over.chain.gz
##originalFile=decode_variants_autosomes.EVA.fixed.vcf.gz
##targetRefGenome=/Users/gungor/References/B38.p13.fa
##liftOverDate=December08,2021
#CHROM  POS ID  REF ALT QUAL    FILTER  INFO

Bcftools outputting the header (no E::bcf_hdr_parse_line):

$ bcftools reheader -h header.txt decode_variants_autosomes.EVA.fixed.B38.crossmap.vcf | bcftools view -h | tail
##contig=<ID=chrY_KI270740.1_random,length=37240,assembly=B38.p13.fa>
##contig=<ID=chrhs38d1,length=6030922,assembly=B38.p13.fa>
##liftOverProgram=CrossMap,version=0.6.1,website=https://sourceforge.net/projects/crossmap
##liftOverChainFile=/Users/gungor/References/hg18ToHg38.over.chain.gz
##originalFile=decode_variants_autosomes.EVA.fixed.vcf.gz
##targetRefGenome=/Users/gungor/References/B38.p13.fa
##liftOverDate=December08,2021
##bcftools_viewVersion=1.14+htslib-1.14
##bcftools_viewCommand=view -h; Date=Wed Dec  8 16:45:30 2021
#CHROM  POS ID  REF ALT QUAL    FILTER  INFO
liguowang commented 2 years ago

This is resolved in v0.6.2