openvar / variantValidator

Public repository for VariantValidator project
GNU Affero General Public License v3.0
67 stars 21 forks source link

Variant causing server error #518

Open Peter-J-Freeman opened 1 year ago

Peter-J-Freeman commented 1 year ago

Describe the bug image

Peter-J-Freeman commented 1 year ago

@leicray, we need a test suite for these. I think that if the boundary is correct, but just badly numbered by the user in terms of the direction they went into the intron, it is valid to correct

Peter-J-Freeman commented 1 year ago

Hello,

I think these are very good fixes. ""ExonBoundaryError: Position c.791-802 has been updated to position to 790+532 ensuring correct HGVS numbering for transcript NM_000086.2" is now an informative message for what has been updated and why. It is also wonderful that VariantValidator offers this update automatically, since calculating and comparing these distances manually would be really cumbersome.

Thank you also for the explanation for the UCSC Genome Browser link. It is now clear why those chr16:28,480,676-28,483,512 coordinates are shown. This is also now clear in the browser since you can see where the actual variant is. Without the VariantValidator track I just expected that the whole view is the variant, which would have been incorrect.

Thank you!

leicray commented 1 year ago

Could you elaborate on what is needed regarding a "test suite"?

Peter-J-Freeman commented 1 year ago

We need a few examples where the user has specified a base in an intron but has counted in the wrong direction, like the example above. I will then add tests to make sure VV handles these correctly.

The example above shows the used defining the incorrect start position but a correct exon boundary (791) and counting backwards into the intron (791-X) when they should have counted forwards into the intron (790+X).

So the reciprocal would be a variant in the back half of an intron but the user has counted forwards from the (790+X) when they should have counted backwards (791-X). I think we only need one example. I can then use these 2 examples and embed them into the variant end position to create 4 separate tests to take care of all eventualities.

Actually, do we need tests where both the start and end of the variant are ill defined do you think @leicray ?

leicray commented 1 year ago

Here are some test descriptions:

Valid variants near splice sites:

NM_000088.4:c.472-1_472del NM_000088.4:c.543_543+1del NM_000088.4:c.2560-3_2561dup NM_000088.4:c.2608_2613+5 del NM_000088.4:c.2559+1_2559+44del

These each validate correctly.

Invalid variants near splice sites:

NM_000088.4:c.472-1_472+1del - ExonBoundaryError: Position c.472+1 does not correspond with an exon boundary for transcript NM_000088.4 NM_000088.4:c.543-1_543+1del - ExonBoundaryError: Position c.543-1 does not correspond with an exon boundary for transcript NM_000088.4 NM_000088.4:c.543_453+1del - Interval end position 453 < interval start position 543 NM_000088.4:c.2560-3_2560+3dup - ExonBoundaryError: Position c.2560+3 does not correspond with an exon boundary for transcript NM_000088.4 NM_000088.4:c.2608_2608+5del - ExonBoundaryError: Position c.2608+5 does not correspond with an exon boundary for transcript NM_000088.4

An invalid variant with an unclear error message:

NM_000088.4:c.2559+1_2599+54del - ExonBoundaryError: Position c.2559+54 does not correspond with an exon boundary for transcript NM_000088.4

The issue here is that the intron length downstream of c,2559 is 88 nucleotides. The +54 position does not formally exist, but there perhaps ought to be automatic correction of the variant description to NM_000088.4:c.2559+2_2560-34del which is the correct description.

Peter-J-Freeman commented 1 year ago

Thanks @leicray. To clarify, you are happy with the warnings here

Invalid variants near splice sites:

NM_000088.4:c.543-1_543+1del - ExonBoundaryError: Position c.543-1 does not correspond with an exon boundary for transcript NM_000088.4
NM_000088.4:c.543_453+1del - Interval end position 453 < interval start position 543
NM_000088.4:c.2560-3_2560+3dup - ExonBoundaryError: Position c.2560+3 does not correspond with an exon boundary for transcript NM_000088.4
NM_000088.4:c.2608_2608+5del - ExonBoundaryError: Position c.2608+5 does not correspond with an exon boundary for transcript NM_000088.4

but 2599+54 should be 2560-34del

That's ideal because we are already handling NM_000086.2(CLN3):c.791-802_1056+1445del and warning

ExonBoundaryError: Position c.791-802 has been updated to position to 790+532 ensuring correct HGVS numbering for transcript NM_000086.2

So, we should also be catching

ExonBoundaryError: Position c.2599+54 has been updated to position to 2560-34 ensuring correct HGVS numbering for transcript NM_000088.4

I can make appropriate tests from these. Thanks

leicray commented 1 year ago

I am happy with the error messages that you listed as a group.

The example that you give for locations either side of an intron mid point suggests that the correction works in one direction, at present, but not the other.

Peter-J-Freeman commented 1 year ago

Yes, think that is currently the case. I have set the code up, but the flow seems to be broken. Fixing now. Thanks for the examples. Can make it happen now

Peter-J-Freeman commented 1 year ago

NM_000086.2:c.791-802_1056+1445del

[
            "ExonBoundaryError: Position c.791-802 has been updated to position to 790+532 ensuring correct HGVS numbering for transcript NM_000086.2"
        ],

NM_000088.4:c.2559_2559+54del

[
            "ExonBoundaryError: Position c.2559+54 has been updated to position to 2560-35 ensuring correct HGVS numbering for transcript NM_000088.4"
        ]

NM_000086.2:c.790_791-802del

[
            "ExonBoundaryError: Position c.791-802 has been updated to position to 790+532 ensuring correct HGVS numbering for transcript NM_000086.2",
            "NM_000086.2:c.790_790+532del normalized to NM_000086.2:c.790+1_790+533del"
        ]

NM_000088.4:c.2559+54_2560del

[
            "ExonBoundaryError: Position c.2559+54 has been updated to position to 2560-35 ensuring correct HGVS numbering for transcript NM_000088.4"
        ],

NM_000088.4:c.2559+54_2559+55del

[
            "ExonBoundaryError: Position c.2559+54 has been updated to position to 2560-35 ensuring correct HGVS numbering for transcript NM_000088.4",
            "ExonBoundaryError: Position c.2559+55 has been updated to position to 2560-34 ensuring correct HGVS numbering for transcript NM_000088.4"
        ],

NM_000086.2:c.791-803_791-802del

[
            "ExonBoundaryError: Position c.791-803 has been updated to position to 790+531 ensuring correct HGVS numbering for transcript NM_000086.2",
            "ExonBoundaryError: Position c.791-802 has been updated to position to 790+532 ensuring correct HGVS numbering for transcript NM_000086.2"
        ]

@leicray Does this look comprehensive?
leicray commented 1 year ago

All looks good to me.

Peter-J-Freeman commented 1 year ago

Ace. Writing tests then will link in the push and close