openvar / variantValidator

Public repository for VariantValidator project
GNU Affero General Public License v3.0
71 stars 21 forks source link

TranscriptMapper error - No transcript exons #386

Closed sbenny1230 closed 2 years ago

sbenny1230 commented 2 years ago

Describe the bug Error returned with TranscriptMapper when Ensembl transcript is selected.

To Reproduce

import json
import VariantValidator
vval = VariantValidator.Validator()
variant = 'ENST00000225964.10:c.589-1GG>G' # variant 1
genome_build = 'GRCh37'
select_transcripts = 'all'
transcript_set = 'ensembl'
validate = vval.validate(variant, genome_build, select_transcripts, transcript_set)
validation = validate.format_as_dict(with_meta=True)
print(json.dumps(validation, sort_keys=True, indent=4, separators=(',', ': ')))

Output returned

{
    "flag": "warning",
    "metadata": {
        "variantvalidator_hgvs_version": "2.0.2.dev1+g6ecbf8e",
        "variantvalidator_version": "1.0.5.dev273+g7d58e7e.d20220617",
        "vvdb_version": "vvdb_2022_04",
        "vvseqrepo_db": "VV_SR_2022_02/master",
        "vvta_version": "vvta_2022_02"
    },
    "validation_warning_1": {
        "alt_genomic_loci": [],
        "annotations": {},
        "gene_ids": {},
        "gene_symbol": "",
        "genome_context_intronic_sequence": "",
        "hgvs_lrg_transcript_variant": "",
        "hgvs_lrg_variant": "",
        "hgvs_predicted_protein_consequence": {
            "lrg_slr": "",
            "lrg_tlr": "",
            "slr": "",
            "tlr": ""
        },
        "hgvs_refseqgene_variant": "",
        "hgvs_transcript_variant": "",
        "primary_assembly_loci": {},
        "reference_sequence_records": "",
        "refseqgene_context_intronic_sequence": "",
        "selected_assembly": "GRCh37",
        "submitted_variant": "ENST00000225964.10:c.589-1GG>G",
        "transcript_description": "",
        "validation_warnings": [
            "ENST00000225964.10:c.589-1GG>G automapped to ENST00000225964.10:c.589-1_589delGGinsG",
            "Removing redundant reference bases from variant description",
            "ENST00000225964.10:c.589-1_589delGGinsG automapped to equivalent RefSeq transcript NM_000088.4:c.589-1_589delGGinsG",
            "TranscriptMapper(tx_ac=NM_000088.4, alt_ac=NC_000017.11, alt_aln_method=genebuild): No transcript exons"
        ],
        "variant_exonic_positions": null
    }
}
sbenny1230 commented 2 years ago

Issue #204 mentions the error is caused by splign alignment. This suggests the genebuild mapper I added probably isn't correct.

Peter-J-Freeman commented 2 years ago

Hi @sbenny1230 . Again, no worries, let me take a look. You are also on a learning curve r.e. the software. This is all stuff that counts in your MSc. I want to encourage more open coding etc, so we will discuss how to write it up. If you see this issue https://github.com/openvar/variantValidator/issues/387 and make sure I get the correct branch, I'll see if I can get this debugged today and get you ready for the next tasks

sbenny1230 commented 2 years ago

So I was looking at the error again and realised its passing in tx_ac=NM_000088.4 when it should be tx_ac=ENST00000225964.10.

There's a bit of code in vvMixinCore.py which automaps to an equivalent RefSeq transcript. I've commented this bit out (shown below) and I've got a different error now.

                    # ENST support needs to be re-evaluated, but is very low priority
                    # ENST not supported by ACMG and is under review by HGVS
                    if my_variant.refsource == 'ENS':
                        trap_ens_in = str(my_variant.hgvs_formatted)
                        sim_tx = self.hdp.get_similar_transcripts(my_variant.hgvs_formatted.ac)
                        for line in sim_tx:
                            if line[2] and line[3] and line[4] and line[5] and line[6]:
                                my_variant.hgvs_formatted.ac = line[1]
                                my_variant.set_quibble(str(my_variant.hgvs_formatted))
                                formatted_variant = my_variant.quibble
                                break
                        if my_variant.refsource == 'ENS':
                            error = 'Unable to map ' + my_variant.hgvs_formatted.ac + \
                                    ' to an equivalent RefSeq transcript'
                            my_variant.warnings.append(error)
                            logger.warning(error)
                            continue
                        else:
                            my_variant.warnings.append(str(trap_ens_in) + ' automapped to equivalent RefSeq transcript '
                                                       + my_variant.quibble)
                            logger.info(str(trap_ens_in) + ' automapped to equivalent RefSeq '
                                                           'transcript ' + my_variant.quibble)
sbenny1230 commented 2 years ago

Error for same variant as before ENST00000225964.10:c.589-1GG>G

{
    "flag": "warning",
    "metadata": {
        "variantvalidator_hgvs_version": "2.0.2.dev1+g6ecbf8e",
        "variantvalidator_version": "1.0.5.dev273+g7d58e7e.d20220617",
        "vvdb_version": "vvdb_2022_04",
        "vvseqrepo_db": "VV_SR_2022_02/master",
        "vvta_version": "vvta_2022_02"
    },
    "validation_warning_1": {
        "alt_genomic_loci": [],
        "annotations": {},
        "gene_ids": {},
        "gene_symbol": "",
        "genome_context_intronic_sequence": "",
        "hgvs_lrg_transcript_variant": "",
        "hgvs_lrg_variant": "",
        "hgvs_predicted_protein_consequence": {
            "lrg_slr": "",
            "lrg_tlr": "",
            "slr": "",
            "tlr": ""
        },
        "hgvs_refseqgene_variant": "",
        "hgvs_transcript_variant": "",
        "primary_assembly_loci": {},
        "reference_sequence_records": "",
        "refseqgene_context_intronic_sequence": "",
        "selected_assembly": "GRCh37",
        "submitted_variant": "ENST00000225964.10:c.589-1GG>G",
        "transcript_description": "",
        "validation_warnings": [
            "ENST00000225964.10:c.589-1GG>G automapped to ENST00000225964.10:c.589-1_589delGGinsG",
            "Removing redundant reference bases from variant description",
            "Required information for ENST00000225964.10 is missing from the Universal Transcript Archive",
            "Query gene2transcripts with search term ENST00000225964 for available transcripts"
        ],
        "variant_exonic_positions": null
    }
}
Peter-J-Freeman commented 2 years ago

Nice work. I can look at this today.

Give me a few hours

Referencing here. I will clone https://github.com/openvar/variantValidator/tree/vv_ensembl_develop and branch from it.

Peter-J-Freeman commented 2 years ago

OK after commit https://github.com/openvar/variantValidator/commit/38a1932dad1c865ccc22c68afbe0b1a99780ff03 I'm seeing the following output

{
    "flag": "warning",
    "metadata": {
        "variantvalidator_hgvs_version": "2.0.2.dev1+g6ecbf8e",
        "variantvalidator_version": "2.1.1.dev2+g294fd63",
        "vvdb_version": "vvdb_2022_04",
        "vvseqrepo_db": "VV_SR_2022_02/master",
        "vvta_version": "vvta_2022_02"
    },
    "validation_warning_1": {
        "alt_genomic_loci": [],
        "annotations": {},
        "gene_ids": {},
        "gene_symbol": "",
        "genome_context_intronic_sequence": "",
        "hgvs_lrg_transcript_variant": "",
        "hgvs_lrg_variant": "",
        "hgvs_predicted_protein_consequence": {
            "lrg_slr": "",
            "lrg_tlr": "",
            "slr": "",
            "tlr": ""
        },
        "hgvs_refseqgene_variant": "",
        "hgvs_transcript_variant": "",
        "primary_assembly_loci": {},
        "reference_sequence_records": "",
        "refseqgene_context_intronic_sequence": "",
        "selected_assembly": "GRCh37",
        "submitted_variant": "ENST00000225964.10:c.589-1GG>G",
        "transcript_description": "",
        "validation_warnings": [
            "ENST00000225964.10:c.589-1GG>G automapped to ENST00000225964.10:c.589-1_589delGGinsG",
            "Removing redundant reference bases from variant description",
            "Required information for ENST00000225964.10 is missing from the Universal Transcript Archive",
            "Query gene2transcripts with search term ENST00000225964 for available transcripts"
        ],
        "variant_exonic_positions": null
    }
}

Do you get this? I did notice that Ensembl rest may be down so outputs might change!!!! SIGH!!!!

If you get this, this transcript seems to be missing from VVTA. I will check now

sbenny1230 commented 2 years ago

Fixed on branch vv_ensembl_develop