openvar / variantValidator

Public repository for VariantValidator project
GNU Affero General Public License v3.0
67 stars 21 forks source link

Invalid insertion variant description not trapped #480

Closed leicray closed 1 year ago

leicray commented 1 year ago

Describe the bug An anonymous user has submitted the variant description NM_207122.2:c.1173_1174+1insAT resulting in an ERROR message to the Admins. NM_207122.2 is the MANE Select transcript for the EXT2 gene.

When that variant is submitted to the Validator tool, the error message is Unable to validate the submitted variant NM_207122.2:c.1173_1174+1insAT against the GRCh38 assembly. Selecting genome build GRCh37 generates a corresponding error message.

Position 1173 is the last nucleotide of exon 7 and, of course, position 1174 is the first nucleotide of exon 8. Hence, position 1174+1 does not exist. The description is invalid in that the insertion point is not between two adjacent nucleotides and the latter position is also invalid.

Descriptions errors of this type ought to be more elegantly handled.

Peter-J-Freeman commented 1 year ago

@leicray.

OK, there is more than one issue with the description. I have added a quick bug fix and the first warning we will generate is

GRCh38 mapping

"validation_warnings": [ "insertion length must be 1" ],

For NM_207122.2:c.1173_1174+1insAT, the insertion length is 2 bases.

I also tested NM_207122.2:c.1174_1174+1insAT

Validates and is not spotting that the 1174+1 is not a valid position. This needs to be corrected first.

Peter-J-Freeman commented 1 year ago

@leicray

OK, I tweaked the code. We now can generate

    "validation_warnings": [
        "ExonBoundaryError: Position c.1174_1174+1 does not correspond with an exon boundary for transcript NM_207122.2"
    ],
Peter-J-Freeman commented 1 year ago

Updated to "validation_warnings": [ "ExonBoundaryError: Position c.1174+1 does not correspond with an exon boundary for transcript NM_207122.2" ],