openvar / variantValidator

Public repository for VariantValidator project
GNU Affero General Public License v3.0
67 stars 21 forks source link

Numbering the Ter position in a Protein #547

Open Peter-J-Freeman opened 9 months ago

Peter-J-Freeman commented 9 months ago

Describe the bug This caused a crash

import json
import VariantValidator
vval = VariantValidator.Validator()
variant = "chr7-117307162-G-GG"
genome_build = 'GRCh37'
select_transcripts = 'all'
transcript_set = 'refseq'
validate = vval.validate(variant, genome_build, select_transcripts, transcript_set)
validation = validate.format_as_dict(with_meta=True)
print(json.dumps(validation, sort_keys=True, indent=4, separators=(',', ': ')))

The issue was that the variation changes the c. sequence but results in Ter=

Expected behavior Unclear. We do not number Ter because it is not part of the reference sequence

Peter-J-Freeman commented 9 months ago

Currently I have set the output to

        "hgvs_predicted_protein_consequence": {
            "lrg_slr": "LRG_663p1:p.(*=)",
            "lrg_tlr": "LRG_663p1:p.(Ter=)",
            "slr": "NP_000483.3:p.(*=)",
            "tlr": "NP_000483.3:p.(Ter=)"
        },

Any comments @ifokkema @John-F-Wagstaff @leicray

leicray commented 9 months ago

The HGVS variant nomenclature for proteins does provide guidance regarding how to number the termination codon. Such numbering is necessary to define a C-terminal extension: https://varnomen.hgvs.org/recommendations/protein/variant/extension/

For RefSeq protein reference sequence NP_000483.3 the termination position would 1481 and no change to terminator would be recorded as NP_000483.3:p.(Ter1481=).