Closed bartcharbon closed 8 months ago
Can you show the VEP header line and the data line please?
Header line:
##INFO=<ID=CSQ,Number=.,Type=String,Description="Consequence annotations from Ensembl VEP. Format: Allele|Consequence|IMPACT|SYMBOL|Gene|Feature_type|Feature|BIOTYPE|EXON|INTRON|HGVSc|HGVSp|cDNA_position|CDS_position|Protein_position|Amino_acids|Codons|Existing_variation|ALLELE_NUM|DISTANCE|STRAND|FLAGS|PICK|SYMBOL_SOURCE|HGNC_ID|REFSEQ_MATCH|REFSEQ_OFFSET|SOURCE|SIFT|PolyPhen|HGVS_OFFSET|CLIN_SIG|SOMATIC|PHENO|PUBMED|CHECK_REF|MOTIF_NAME|MOTIF_POS|HIGH_INF_POS|MOTIF_SCORE_CHANGE|TRANSCRIPTION_FACTORS|Grantham|SpliceAI_pred_DP_AG|SpliceAI_pred_DP_AL|SpliceAI_pred_DP_DG|SpliceAI_pred_DP_DL|SpliceAI_pred_DS_AG|SpliceAI_pred_DS_AL|SpliceAI_pred_DS_DG|SpliceAI_pred_DS_DL|SpliceAI_pred_SYMBOL|CAPICE_CL|CAPICE_SC|existing_InFrame_oORFs|existing_OutOfFrame_oORFs|existing_uORFs|five_prime_UTR_variant_annotation|five_prime_UTR_variant_consequence|IncompletePenetrance|InheritanceModesGene|VKGL|VKGL_CL|gnomAD_AF|gnomAD_COV|gnomAD_FAF95|gnomAD_FAF99|gnomAD_HN|gnomAD_QC|gnomAD_SRC|clinVar_CLNID|clinVar_CLNREVSTAT|clinVar_CLNSIG|clinVar_CLNSIGINCL|ASV_ACMG_class|ASV_AnnotSV_ranking_criteria|ASV_AnnotSV_ranking_score|ALPHSCORE|ncER|phyloP">
Data line:
chr3 48565192 . GGTACCCGCTCTGCAGGTAGGGCAGGGTGTGCTGGGAGCAGTGGCTGCTGGCCCCGGGGCAAGGTGGGCAGCACTGATTTCCACTGTGTGCACACAGTGCCCATGCGTGTGCCCTGCATGCAGACCCTACGTGCTTGGCGTGTGCCCTGCATTCATGGACACCCATGTGCGTGTCTCGGCCCCACCCATAGCTGCCCCACGGGTTCAGCTGTCCTCACCTTCC G . PASS CSQ=-|splice_acceptor_variant&splice_donor_variant&frameshift_variant&stop_lost&splice_donor_5th_base_variant&intron_variant|HIGH|COL7A1|1294|Transcript|NM_000094.4|protein_coding|116-117/119|116/118|NM_000094.4:c.8523_8536del|NP_000085.1:p.Glu2841AspfsTer3|8586-8599/9231|8523-8536/8835|2841-2846/2944|EEGEDS*TRGAAMGGAETRTWVSMNAGHTPST*GLHAGHTHGHCVHTVEISAAHLAPGPAATAPSTPCPTCRAGTX/DX|gaGGAAGGTGAGGACAGCTGAACCCGTGGGGCAGCTATGGGTGGGGCCGAGACACGCACATGGGTGTCCATGAATGCAGGGCACACGCCAAGCACGTAGGGTCTGCATGCAGGGCACACGCATGGGCACTGTGTGCACACAGTGGAAATCAGTGCTGCCCACCTTGCCCCGGGGCCAGCAGCCACTGCTCCCAGCACACCCTGCCCTACCTGCAGAGCGGGTACcc/gacc||1||-1||1|EntrezGene||||||||||||||||||||||||||||VUS|0.5681088|||||||AD&AR||||||||||||||4|1A_(cf_Gene_count%2C_RE_gene%2C_+0.00)%3B2E-1_(COL7A1%2C_+0.90)%3B3A_(1_gene%2C_+0.00)%3B5F_(+0.00)|0.9||99.7739|,-|upstream_gene_variant|MODIFIER|PFKFB4|5210|Transcript|NM_001317136.2|protein_coding|||||||||||1|4064|-1|||EntrezGene||||||||||||||||||||||||||||VUS|0.5681742|||||||||||||||||||||||||99.7739|,-|upstream_gene_variant|MODIFIER|UCN2|90226|Transcript|NM_033199.4|protein_coding|||||||||||1|1412|-1|||EntrezGene||||||||||||||||||||||||||||VUS|0.5681742|||||||||||||||||||||||||99.7739| GT 1/1
This is tied to the automatic type parsing introduced in https://github.com/samtools/bcftools/commit/2191405e8afd9b123d18dc7084459d409afc4ea4, where fields like cDNA_position are assumed to be integers. Your example shows that the assumption is incorrect, therefore we will set the automatic type to String.
In both versions one can enforce the desired type with -c cDNA_position:int
or -c cDNA_position:string
.
Newly a warning is printed when a numeric type cannot be parsed fully.
Thank you for the bug report
I have a annotated VCF I split using bcftools split-vep
the cDNA, CDS en protein positions in the CSQ are:
|8586-8599/9231|8523-8536/8835|2841-2846/2944 |
The split vep output for those fields in bcftools 1.19 is
8586 8523 2841
in bcftools 1.17 the output was correct:
8586-8599/9231 8523-8536/8835 2841-2846/2944