Open leicray opened 1 year ago
Unable to validate the submitted variant NP_000079.2:p.(Gly197Cys) against the GRCh37 assembly
I think these warnings need to be suppressed
Protein level variant descriptions are not fully supported due to redundancy in the genetic code NP_000079.2:p.(Gly197Cys) is HGVS compliant and contains a valid reference amino acid description
These warnings should always be displayed though
Is this sufficient. If so, I will fix for the next release
Supressing the warning Unable to validate the submitted variant NP_000079.2:p.(Gly197Cys) against the GRCh37 assembly
is certainly appropriate.
However, I suspect that more might need to be done for the handling of non-substitution protein-level variants. At present, the variants NP_000079.2:p.(Gly197Cys)
and NP_061966.1:p.(Gln289_Gln332del)
both trigger error messages, both on-screen and to the admins. We need to ensure that variant descriptions of these types are also being checked with respect to the amino acids and that their locations are valid. It currently looks as though variant descriptions of these types are not being checked in the same fashion as simple-substitution descriptions.
Ah, I wonder if they are not validating then. I will look into it
import json
import VariantValidator
vval = VariantValidator.Validator()
variant = 'NP_000079.2:p.(Gly197Cys)' # variant 1
genome_build = 'GRCh37'
select_transcripts = 'all'
transcript_set = 'refseq'
validate = vval.validate(variant, genome_build, select_transcripts, transcript_set)
validation = validate.format_as_dict(with_meta=True)
print(json.dumps(validation, sort_keys=True, indent=4, separators=(',', ': '))
This is the current output
{
"flag": "warning",
"metadata": {
"variantvalidator_hgvs_version": "2.0.2.dev5+g69b1a7c",
"variantvalidator_version": "2.1.1.dev69+g4e3c76e",
"vvdb_version": "vvdb_2022_11",
"vvseqrepo_db": "VV_SR_2022_11/master",
"vvta_version": "vvta_2022_11"
},
"validation_warning_1": {
"alt_genomic_loci": [],
"annotations": {},
"gene_ids": {},
"gene_symbol": "",
"genome_context_intronic_sequence": "",
"hgvs_lrg_transcript_variant": "",
"hgvs_lrg_variant": "",
"hgvs_predicted_protein_consequence": {
"lrg_slr": "LRG_1p1:p.(G197C)",
"lrg_tlr": "LRG_1p1:p.(Gly197Cys)",
"slr": "NP_000079.2:p.(G197C)",
"tlr": "NP_000079.2:p.(Gly197Cys)"
},
"hgvs_refseqgene_variant": "",
"hgvs_transcript_variant": "",
"primary_assembly_loci": {},
"reference_sequence_records": {
"protein": "https://www.ncbi.nlm.nih.gov/nuccore/NP_000079.2"
},
"refseqgene_context_intronic_sequence": "",
"rna_variant_descriptions": null,
"selected_assembly": "GRCh37",
"submitted_variant": "NP_000079.2:p.(Gly197Cys)",
"transcript_description": "",
"validation_warnings": [
"Protein level variant descriptions are not fully supported due to redundancy in the genetic code",
"NP_000079.2:p.(Gly197Cys) is HGVS compliant and contains a valid reference amino acid description"
],
"variant_exonic_positions": null
}
}
The warnings are here and seem relatively appropriate?
"validation_warnings": [
"Protein level variant descriptions are not fully supported due to redundancy in the genetic code",
"NP_000079.2:p.(Gly197Cys) is HGVS compliant and contains a valid reference amino acid description"
],
import json
import VariantValidator
vval = VariantValidator.Validator()
variant = 'NP_061966.1:p.(Gln289_Gln332del)' # variant 1
genome_build = 'GRCh37'
select_transcripts = 'all'
transcript_set = 'refseq'
validate = vval.validate(variant, genome_build, select_transcripts, transcript_set)
validation = validate.format_as_dict(with_meta=True)
print(json.dumps(validation, sort_keys=True, indent=4, separators=(',', ': ')))
Throws an error which I will try and resolve
Yes, the first example looks as expected.
The second example will now return
{
"flag": "warning",
"metadata": {
"variantvalidator_hgvs_version": "2.0.2.dev5+g69b1a7c",
"variantvalidator_version": "2.1.1.dev69+g4e3c76e",
"vvdb_version": "vvdb_2022_11",
"vvseqrepo_db": "VV_SR_2022_11/master",
"vvta_version": "vvta_2022_11"
},
"validation_warning_1": {
"alt_genomic_loci": [],
"annotations": {},
"gene_ids": {},
"gene_symbol": "",
"genome_context_intronic_sequence": "",
"hgvs_lrg_transcript_variant": "",
"hgvs_lrg_variant": "",
"hgvs_predicted_protein_consequence": {
"lrg_slr": "",
"lrg_tlr": "",
"slr": "NP_061966.1:p.(Q289_Q332del)",
"tlr": "NP_061966.1:p.(Gln289_Gln332del)"
},
"hgvs_refseqgene_variant": "",
"hgvs_transcript_variant": "",
"primary_assembly_loci": {},
"reference_sequence_records": {
"protein": "https://www.ncbi.nlm.nih.gov/nuccore/NP_061966.1"
},
"refseqgene_context_intronic_sequence": "",
"rna_variant_descriptions": null,
"selected_assembly": "GRCh37",
"submitted_variant": "NP_061966.1:p.(Gln289_Gln332del)",
"transcript_description": "",
"validation_warnings": [
"Protein level variant descriptions are not fully supported due to redundancy in the genetic code",
"NP_061966.1:p.(Gln289_Gln332del) is HGVS compliant and contains a valid reference amino acid description"
],
"variant_exonic_positions": null
}
}
That looks better.
Excellent. I think the error message Unable to validate the submitted variant NP_061966.1:p.(Gln289_Gln332del) against the GRCh38 assembly
is a VVweb message. I will see if I can spin up my laptop dev version. Sometimes it will, sometimes it wont!
OK, the VV engine has been updated to fix this issue. It is the VVweb interface that needs to be updated. I'm having difficulty spinning up a dev version. @John-F-Wagstaff needs to get on with more important jobs. Leave it with me @leicray. I'll try get my system running
OK, managed to spin it up @leicray
Currently on dev it looks like this
Comments please
The second warning looks fine for now. When time allows, it could be modified to accommodate that there are two amino acids mentioned in the variant description. It looks like it was written for situation where the variant description relates to a single amino acid.
A simple change to the warning might be:
NP_061966.1:p.(Gln289_Gln322del) is HGVS compliant and the reference amino acid(s) in the description is/are valid
The more complicated we make it, the more likely it is to go wrong. However, this seems to be a simple change to implement as it is a simple text replacement. Will get it done ASAP
Hold on @leicray . Just realised that some folks may be using this warning as a search term. @ifokkema for example. Let's check before we change it
Do not bother with changing the message if there might be knock-on effects.
Hold on @leicray . Just realised that some folks may be using this warning as a search term. @ifokkema for example. Let's check before we change it
Thanks for thinking about me, but we don't look for warnings used for protein variants. Those should get caught and handled before LOVD sends them to VV.
Thanks. @ifokkema. I have a feeling that we may use it though so for now will leave it and will re-visit if needed
Describe the bug Some users try to validate protein level variants such as
NP_061966.1:p.(Gln289_Gln332del)
but the error message is not sufficiently informative. Submission of this variant produces the on-screen error message:Unable to validate the submitted variant NP_061966.1:p.(Gln289_Gln332del) against the GRCh38 assembly.
Please check your submission and re-submit.
The submission process also creates an ERROR message that is sent to the admins.
However, amino acid substitution variants such as
NP_000079.2:p.(Gly197Cys)
are correctly handled and produce the useful on-screen message:Unable to validate the submitted variant NP_000079.2:p.(Gly197Cys) against the GRCh37 assembly. The following warnings were returned:
Protein level variant descriptions are not fully supported due to redundancy in the genetic code
NP_000079.2:p.(Gly197Cys) is HGVS compliant and contains a valid reference amino acid description
Please check your submission and re-submit.
Expected behaviour VariantValidator needs to better handle amino acid variant descriptions that are not just simple substitutions and provide appropriate error messages.