openvar / variantValidator

Public repository for VariantValidator project
GNU Affero General Public License v3.0
67 stars 21 forks source link

Deletions in .r generate warning for auto-mapping to long form of .c (eg .r:6_8del -> .c:6_8delTTG) when the original is valid #584

Open John-F-Wagstaff opened 5 months ago

John-F-Wagstaff commented 5 months ago

Describe the bug we are getting warnings out like "NM_004006.2:r.6_8del automapped to NM_004006.2:c.6_8delTTG" for .r type variants when NM_004006.2:r.6_8del is valid and more concise. The more concise form is preferred so we do not want to include the extended form in warnings (unless it is somehow needed to describe the error). We possibly don't want to warn on auto-mapping as it is not an error. We already filter auto-mapping warnings out in some of our endpoints, since any warning that is not a warning may confuse the users. If we do want to return a message we may want to tweak the warning message to flag that it is informative rather than cautionary.

To Reproduce input a deletion like NM_004006.2:r.6_8del into vv

import json
import VariantValidator
var = vval.validate(json.dumps(['NM_004006.2:r.6_8del']),'GRCh38',select_transcripts='all',liftover_level=None)
print(var.format_as_dict(with_meta=True))

{'flag': 'gene_variant', 'NM_004006.2:c.6_8del': {'selected_assembly': 'GRCh38', 'submitted_variant': 'NM_004006.2:r.6_8del', 'gene_symbol': 'DMD', 'gene_ids': {'hgnc_id': 'HGNC:2928', 'entrez_gene_id': '1756', 'ensembl_gene_id': 'ENSG00000198947', 'ucsc_id': 'uc004dda.2', 'omim_id': ['300377'], 'ccds_ids': ['CCDS14233', 'CCDS55395', 'CCDS14232', 'CCDS48091', 'CCDS94585', 'CCDS14231', 'CCDS94586', 'CCDS14234', 'CCDS14230', 'CCDS14229', 'CCDS55394']}, 'annotations': {'db_xref': {'CCDS': 'CCDS14233.1', 'select': 'RefSeq', 'ncbigene': '1756', 'ensemblgene': None, 'hgnc': 'HGNC:2928'}, 'chromosome': 'X', 'map': 'Xp21.2-p21.1', 'note': 'dystrophin', 'variant': 'DP427M', 'refseq_select': True, 'mane_select': False, 'ensembl_select': False, 'mane_plus_clinical': False}, 'transcript_description': 'Homo sapiens dystrophin (DMD), transcript variant Dp427m, mRNA', 'hgvs_transcript_variant': 'NM_004006.2:c.6_8del', 'rna_variant_descriptions': {'usage_warnings': ['RNA (r.) descriptions are independent of cDNA descriptions (c.)', 'RNA descriptions must only be used if the RNA has been sequenced and must not be inferred from a cDNA description', 'c. and g. descriptions provided by VariantValidator must only be used if the DNA sequence has been confirmed'], 'rna_variant': 'NM_004006.2:r.6_8del', 'translation': 'NP_003997.1:p.Trp4del', 'translation_slr': 'NP_003997.1:p.W4del'}, 'genome_context_intronic_sequence': '', 'refseqgene_context_intronic_sequence': '', 'hgvs_refseqgene_variant': 'NG_012232.1:g.133303_133305del', 'hgvs_predicted_protein_consequence': {'tlr': 'NP_003997.1:p.(Trp4del)', 'slr': 'NP_003997.1:p.(W4del)', 'lrg_tlr': 'LRG_199p1:p.(Trp4del)', 'lrg_slr': 'LRG_199p1:p.(W4del)'}, 'validation_warnings': ['NM_004006.2:r.6_8del automapped to NM_004006.2:c.6_8delTTG', 'A more recent version of the selected reference sequence NM_004006.2 is available (NM_004006.3): NM_004006.3:c.6_8del MUST be fully validated prior to use in reports: select_variants=NM_004006.3:c.6_8del'], 'hgvs_lrg_transcript_variant': 'LRG_199t1:c.6_8del', 'hgvs_lrg_variant': 'LRG_199:g.133303_133305del', 'alt_genomic_loci': [], 'primary_assembly_loci': {'grch38': {'hgvs_genomic_description': 'NC_000023.11:g.33211305_33211307del', 'vcf': {'chr': 'X', 'pos': '33211304', 'ref': 'CCAA', 'alt': 'C'}}, 'hg38': {'hgvs_genomic_description': 'NC_000023.11:g.33211305_33211307del', 'vcf': {'chr': 'chrX', 'pos': '33211304', 'ref': 'CCAA', 'alt': 'C'}}}, 'variant_exonic_positions': {'NC_000023.11': {'start_exon': '1', 'end_exon': '1'}, 'NG_012232.1': {'start_exon': '1', 'end_exon': '1'}}, 'reference_sequence_records': {'transcript': 'https://www.ncbi.nlm.nih.gov/nuccore/NM_004006.2', 'protein': 'https://www.ncbi.nlm.nih.gov/nuccore/NP_003997.1', 'refseqgene': 'https://www.ncbi.nlm.nih.gov/nuccore/NG_012232.1', 'lrg': 'http://ftp.ebi.ac.uk/pub/databases/lrgex/LRG_199.xml'}}, 'metadata': {'variantvalidator_version': '2.2.1.dev521+g1d4df7a', 'variantvalidator_hgvs_version': '2.2.1.dev0+g69b1a7c.d20240131', 'vvta_version': 'vvta', 'vvseqrepo_db': 'VV_SR_2024_01/master', 'vvdb_version': 'vvdb_2022_11'}}

Expected behavior NM_004006.2:r.6_8del is valid and should be returned, without warning or with an eddited warning

Peter-J-Freeman commented 5 months ago

Not quite correct. All r. variants are mapped onto c. for processing because hgvs.py does not correctly handle r. variants.

There are 2 issues. The warnings should never contain the excess bases e.g. delTTAT shoud be del Do we keep warning about r. to c. as it may be confusing to users

John-F-Wagstaff commented 5 months ago

OK, I have edited a bit to make it more clear, also add the implication that we might want to keep the warning message in some form, I had thought you where more emphatic about wanting rid of it.

Currently we don't have any feature to handle user warnings that would be flagged with INFO in our logs, or not printed depending on log level. It is possible that we might want to add some kind of handling for non critical informational warnings, at least in the long term, if we want to keep these type of warning in, rather than filtering all of the output. edit: do you want me to submit this as a low priority feature request, separate to this bug?

ifokkema commented 5 months ago

FYI: The LOVD3 VV library tosses all "automapped to..." messages because it assumes they are just internal messages about automapping. So, for us, they have no value.

Peter-J-Freeman commented 5 months ago

I'm wondering if they have any real value either. @John-F-Wagstaff we could also just toss them

John-F-Wagstaff commented 5 months ago

We could, I am happy either way you decide on this.

A lot of them have the flavour of "if we get an error later than this will explain why/how we got to this point" but I am not sure how user-helpful they are in action. If we don't want to add a 'info' type flag for VV, to trigger optional extra warnings, then I can see that some (most/all?) of them might be better off just removed from the user visible output.

If we decide that they are not useful to the users, then they probably want to go in as a logger.info() call instead, for our internal use, if that does not cause issues itself.

Peter-J-Freeman commented 5 months ago

I think we will just filter out when I do an error message clean. We can the reinstate if needed. Don;t really use them for bug fixing either so limited internal use as well.