openvar / variantValidator

Public repository for VariantValidator project
GNU Affero General Public License v3.0
67 stars 21 forks source link

NC_000007.14:g.149779575_149779577delinsT no longer mapping to NM_ #513

Closed Peter-J-Freeman closed 1 year ago

Peter-J-Freeman commented 1 year ago

Describe the bug Previously we could map from the following genomic to NM_

<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:x="urn:schemas-microsoft-com:office:excel" xmlns="http://www.w3.org/TR/REC-html40">

2bp gap in tx for 37 and 3 for 38 | NC_000007.14:g.149779575_149779577delinsT | NM_198455.2:c.1115_1116insT -- | -- | -- YUK - NC_000007.14:g.149779570_149779583= | NC_000007.14:g.149779575_149779577= | NM_198455.2:c.1116_1117insAGC   | NC_000007.14:g.149779576_149779578del | NM_198455.2:c.1115_1116=   | NC_000007.14:g.149779577del | NM_198455.2:c.1115_1116dup   | NC_000007.14:g.149779573_149779579del | NM_198455.2:c.1114_1117del   | NC_000007.14:g.149779573_149779579delinsCA | NM_198455.2:c.1114_1117delinsCA

To Reproduce Gubmit genomic description - Only currently returning an NR_

Peter-J-Freeman commented 1 year ago

It looks like we can go from c. to g. so not sure why we can't go from g. to c.

Peter-J-Freeman commented 1 year ago

Same with <html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:x="urn:schemas-microsoft-com:office:excel" xmlns="http://www.w3.org/TR/REC-html40">

NC_000004.11:g.140811111_140811122del | MAML3 | Homo sapiens mastermind like transcriptional coactivator 3 (MAML3), mRNA | NM_018717.4:c.1465_1469CAACA= -- | -- | -- | -- NC_000004.11:g.140811111_140811122CTGCTGCTGCTG= | MAML3 | Homo sapiens mastermind like transcriptional coactivator 3 (MAML3), mRNA | NM_018717.4:c.1503_1514dup NC_000004.11:g.140811117_140811122del | MAML3 | Homo sapiens mastermind like transcriptional coactivator 3 (MAML3), mRNA | NM_018717.4:c.1509_1514dup NC_000004.11:g.140811111_140811117del | MAML3 | Homo sapiens mastermind like transcriptional coactivator 3 (MAML3), mRNA | NM_018717.4:c.1468_1472dup NC_000004.11:g.140811117C>A | MAML3 | Homo sapiens mastermind like transcriptional coactivator 3 (MAML3), mRNA | NM_018717.4:c.1472_1473insTCAGCAGCAGCA

Peter-J-Freeman commented 1 year ago

@John-F-Wagstaff are these transcripts hidden by default or should we be finding them from the g > t mapping?

John-F-Wagstaff commented 1 year ago

NM_198455.2 is "permanently suppressed"

NM_018717.4 has not but there is a updated version NM_018717.5 which we just added for the 2023_05 release so this will be the result eg

import VariantValidator
import json
vval = VariantValidator.Validator()
transcript_set, select_transcripts, genome_build = 'refseq', 'all', 'GRCh37'
variant = 'NC_000004.11:g.140811111_140811122del'
validate = vval.validate(variant, genome_build, select_transcripts, transcript_set)
validation = validate.format_as_dict(with_meta=True)
print(json.dumps(validation, sort_keys=True, indent=4, separators=(',', ': ')))

produces

{
    "NM_018717.5:c.1515_1526del": {
        "alt_genomic_loci": [],
        "annotations": {
            "chromosome": "4",
            "db_xref": {
                "CCDS": "CCDS54805.1",
                "ensemblgene": null,
                "hgnc": "HGNC:16272",
                "ncbigene": "55534",
                "select": "MANE"
            },
            "ensembl_select": false,
            "mane_plus_clinical": false,
            "mane_select": true,
            "map": "4q31.1",
            "note": "mastermind like transcriptional coactivator 3",
            "refseq_select": true,
            "variant": "0"
        },
        "gene_ids": {
            "ccds_ids": [
                "CCDS54805"
            ],
            "ensembl_gene_id": "ENSG00000196782",
            "entrez_gene_id": "55534",
            "hgnc_id": "HGNC:16272",
            "omim_id": [
                "608991"
            ],
            "ucsc_id": "uc062zte.1"
        },
        "gene_symbol": "MAML3",
        "genome_context_intronic_sequence": "",
        "hgvs_lrg_transcript_variant": "",
        "hgvs_lrg_variant": "",
        "hgvs_predicted_protein_consequence": {
            "lrg_slr": "",
            "lrg_tlr": "",
            "slr": "NP_061187.3:p.(Q507_Q510del)",
            "tlr": "NP_061187.3:p.(Gln507_Gln510del)"
        },
        "hgvs_refseqgene_variant": "",
        "hgvs_transcript_variant": "NM_018717.5:c.1515_1526del",
        "primary_assembly_loci": {
            "grch37": {
                "hgvs_genomic_description": "NC_000004.11:g.140811111_140811122del",
                "vcf": {
                    "alt": "T",
                    "chr": "4",
                    "pos": "140811063",
                    "ref": "TTGCTGCTGCTGC"
                }
            },
            "grch38": {
                "hgvs_genomic_description": "NC_000004.12:g.139889957_139889968del",
                "vcf": {
                    "alt": "T",
                    "chr": "4",
                    "pos": "139889909",
                    "ref": "TTGCTGCTGCTGC"
                }
            },
            "hg19": {
                "hgvs_genomic_description": "NC_000004.11:g.140811111_140811122del",
                "vcf": {
                    "alt": "T",
                    "chr": "chr4",
                    "pos": "140811063",
                    "ref": "TTGCTGCTGCTGC"
                }
            },
            "hg38": {
                "hgvs_genomic_description": "NC_000004.12:g.139889957_139889968del",
                "vcf": {
                    "alt": "T",
                    "chr": "chr4",
                    "pos": "139889909",
                    "ref": "TTGCTGCTGCTGC"
                }
            }
        },
        "reference_sequence_records": {
            "protein": "https://www.ncbi.nlm.nih.gov/nuccore/NP_061187.3",
            "transcript": "https://www.ncbi.nlm.nih.gov/nuccore/NM_018717.5"
        },
        "refseqgene_context_intronic_sequence": "",
        "rna_variant_descriptions": null,
        "selected_assembly": "GRCh37",
        "submitted_variant": "NC_000004.11:g.140811111_140811122del",
        "transcript_description": "Homo sapiens mastermind like transcriptional coactivator 3 (MAML3), mRNA",
        "validation_warnings": [],
        "variant_exonic_positions": {
            "NC_000004.11": {
                "end_exon": "2",
                "start_exon": "2"
            },
            "NC_000004.12": {
                "end_exon": "2",
                "start_exon": "2"
            }
        }
    },
    "flag": "gene_variant",
    "metadata": {
        "variantvalidator_hgvs_version": "2.2.1.dev0+g69b1a7c.d20230629",
        "variantvalidator_version": "2.1.1.dev91+ga9a690c.d20230630",
        "vvdb_version": "vvdb_2022_11",
        "vvseqrepo_db": "VV_SR_2023_05_v2/master",
        "vvta_version": "vvta"
    }
}
Peter-J-Freeman commented 1 year ago

good point, I totally forgot that we filter for latest versions now unless stated.

Peter-J-Freeman commented 1 year ago

Checked with the expanded workflow and this works, Not a bug