openvar / variantValidator

Public repository for VariantValidator project
GNU Affero General Public License v3.0
67 stars 21 forks source link

Variants based on an LRG fail to validate #501

Closed leicray closed 1 year ago

leicray commented 1 year ago

Describe the bug A user has tried to validate the variant descriptions NG_008218.2:g.18053A>G and NG_008218.2:g.16370G>C and each has failed to validate. In each case, an ERROR message was generated for the Admins. On screen, the error message seen by the user is:

Unable to validate the submitted variant NG_008218.2:g.18053A>G against the GRCh38 assembly.

Please check your submission and re-submit.

Submission to the batch validator results in a failed job, but no ERROR message generation.

Details of the LRG can be found at:

http://ftp.ebi.ac.uk/pub/databases/lrgex/LRG_765.xml

leicray commented 1 year ago

The two variants that are described in the context of LRG_765 can be mapped to the t1 transcript NM_002693.2 to give the variant descriptions NM_002693.2:c.2243G>C and NM_002693:c.2591A>G.

Both of these transcript-based descriptions validate correctly and project back to the original LRG-based descriptions: VariantValidator_report_NM_002693.2 c.2591A G.pdf VariantValidator_report_NM_002693.2 c.2243G C.pdf

This suggests that the problem is not caused by the databases being incomplete.

leicray commented 1 year ago

This bug does not affect all variant descriptions based on a LRG reference sequence. The variant description NG_007400.1:g.8638G>T validates correctly.

leicray commented 1 year ago

Another(?) user has tried to validate NG_033796.2:g.125820G>C and this too has triggered an ERROR message.

leicray commented 1 year ago

Perhaps related, another user-submitted variant description NR_033955.2:r.164c>a has also triggered an ERROR message. As with the LRG-based variant descriptions, this fails when submitted to the batch tool, but does not trigger an ERROR message. The job just fails.

Peter-J-Freeman commented 1 year ago

Was caused because RefSeqGenes are old and sometimes the transcripts they pull back from the database only align to GRCh37 and not GRCh38. Simply looping these out since the request is for GRCh38