openvar / rest_variantValidator

REST Interface for VariantValidator. Includes docker container
GNU Affero General Public License v3.0
11 stars 37 forks source link

Sometimes rest VV returns results unrelated to query #74

Open vidboda opened 2 years ago

vidboda commented 2 years ago

Sometimes, but quite often indeed, VV returns results that are totally unrelated from the initial query.

For example, today I submitted this query using the swagger UI at https://rest.variantvalidator.org/:

https://rest.variantvalidator.org/VariantValidator/variantvalidator/**hg19**/**15**-43921089-C-T/all?content-type=application%2Fjson

and got as results a variant located on chr12:

"selected_assembly": "GRCh38", "submitted_variant": "NM_015409.5:c.4421C>T", "transcript_description": "Homo sapiens E1A binding protein p400 (EP400), mRNA", "validation_warnings": [ "RefSeqGene record not available" ], "variant_exonic_positions": { "NC_000012.11": { "end_exon": "22", "start_exon": "22" }, "NC_000012.12": { "end_exon": "22", "start_exon": "22" } }

Other examples seen today returned by MobiDetails (still using the REST API):

The error seems to occur whatever the select assembly is (hg19, hg38, GRCh37, GRCh38).

Peter-J-Freeman commented 2 years ago

I have had a think about how we might handle this programatically

Did you by any chance keep a full json for one of these errors???? @beboche. If so, please can you place it below. Please also include the API call.

vidboda commented 2 years ago

I have two of these from yesterday:

https://rest.variantvalidator.org/VariantValidator/variantvalidator/GRCh38/NM_002735.3:c.253A>C/all?content-type=application/json

{'NM_001330609.1:c.203C>T': {'alt_genomic_loci': [], 'annotations': {'chromosome': '8', 'db_xref': {'CCDS': 'CCDS83325.1', 'ensemblgene': None, 'hgnc': 'HGNC:28984', 'ncbigene': '9897', 'select': False}, 'ensembl_select': False, 'mane_plus_clinical': False, 'mane_select': False, 'map': '8q24.13', 'note': 'WASH complex subunit 5', 'refseq_select': False, 'variant': '2'}, 'gene_ids': {'ccds_ids': ['CCDS83325', 'CCDS6355'], 'ensembl_gene_id': 'ENSG00000164961', 'entrez_gene_id': '9897', 'hgnc_id': 'HGNC:28984', 'omim_id': ['610657'], 'ucsc_id': 'uc003yrt.4'}, 'gene_symbol': 'WASHC5', 'genome_context_intronic_sequence': '', 'hgvs_lrg_transcript_variant': '', 'hgvs_lrg_variant': '', 'hgvs_predicted_protein_consequence': {'lrg_slr': '', 'lrg_tlr': '', 'slr': 'NP_001317538.1:p.(P68L)', 'tlr': 'NP_001317538.1:p.(Pro68Leu)'}, 'hgvs_refseqgene_variant': '', 'hgvs_transcript_variant': 'NM_001330609.1:c.203C>T', 'primary_assembly_loci': {'grch37': {'hgvs_genomic_description': 'NC_000008.10:g.126091044G>A', 'vcf': {'alt': 'A', 'chr': '8', 'pos': '126091044', 'ref': 'G'}}, 'grch38': {'hgvs_genomic_description': 'NC_000008.11:g.125078802G>A', 'vcf': {'alt': 'A', 'chr': '8', 'pos': '125078802', 'ref': 'G'}}, 'hg19': {'hgvs_genomic_description': 'NC_000008.10:g.126091044G>A', 'vcf': {'alt': 'A', 'chr': 'chr8', 'pos': '126091044', 'ref': 'G'}}, 'hg38': {'hgvs_genomic_description': 'NC_000008.11:g.125078802G>A', 'vcf': {'alt': 'A', 'chr': 'chr8', 'pos': '125078802', 'ref': 'G'}}}, 'reference_sequence_records': {'protein': '[https://www.ncbi.nlm.nih.gov/nuccore/NP_001317538.1](https://urldefense.com/v3/__https://www.ncbi.nlm.nih.gov/nuccore/NP_001317538.1__;!!DV4KuIgKKrh48VMFxQ!Vdw0Sv5QSeylWP_y_n1o3XZV7VGFPYF099b_v6kVbkM0VR8rKUod9TDSkbHhBHZV_wvuF-vR$)', 'transcript': '[https://www.ncbi.nlm.nih.gov/nuccore/NM_001330609.1](https://urldefense.com/v3/__https://www.ncbi.nlm.nih.gov/nuccore/NM_001330609.1__;!!DV4KuIgKKrh48VMFxQ!Vdw0Sv5QSeylWP_y_n1o3XZV7VGFPYF099b_v6kVbkM0VR8rKUod9TDSkbHhBHZV_8npKy6i$)'}, 'refseqgene_context_intronic_sequence': '', 'selected_assembly': 'GRCh38', 'submitted_variant': 'NC_000008.11:g.125078802G>A', 'transcript_description': 'Homo sapiens WASH complex subunit 5 (WASHC5), transcript variant 2, mRNA', 'validation_warnings': ['A more recent version of the selected reference sequence NM_001330609.1 is available (NM_001330609.2): NM_001330609.2:c.203C>T MUST be fully validated prior to use in reports: select_variants=NM_001330609.2:c.203C>T', 'RefSeqGene record not available'], 'variant_exonic_positions': {'NC_000008.10': {'end_exon': '5', 'start_exon': '5'}, 'NC_000008.11': {'end_exon': '5', 'start_exon': '5'}}}, 'flag': 'gene_variant', 'metadata': {'variantvalidator_hgvs_version': '2.0.1', 'variantvalidator_version': '2.0.1.dev31+g31426d1', 'vvdb_version': 'vvdb_2021_4', 'vvseqrepo_db': 'VV_SR_2021_2/master', 'vvta_version': 'vvta_2021_2'}, 'validation_warning_1': {'alt_genomic_loci': [], 'annotations': {}, 'gene_ids': {}, 'gene_symbol': '', 'genome_context_intronic_sequence': '', 'hgvs_lrg_transcript_variant': '', 'hgvs_lrg_variant': '', 'hgvs_predicted_protein_consequence': {'lrg_slr': '', 'lrg_tlr': '', 'slr': '', 'tlr': ''}, 'hgvs_refseqgene_variant': '', 'hgvs_transcript_variant': '', 'primary_assembly_loci': {}, 'reference_sequence_records': '', 'refseqgene_context_intronic_sequence': '', 'selected_assembly': 'GRCh38', 'submitted_variant': 'NC_000008.11:g.125078802G>A', 'transcript_description': '', 'validation_warnings': ['A more recent version of the selected reference sequence NM_001330609.1 is available (NM_001330609.2): NM_001330609.2:c.203C>T MUST be fully validated prior to use in reports: select_variants=NM_001330609.2:c.203C>T', 'RefSeqGene record not available'], 'variant_exonic_positions': None}, 'validation_warning_2': {'alt_genomic_loci': [], 'annotations': {}, 'gene_ids': {}, 'gene_symbol': '', 'genome_context_intronic_sequence': '', 'hgvs_lrg_transcript_variant': '', 'hgvs_lrg_variant': '', 'hgvs_predicted_protein_consequence': {'lrg_slr': '', 'lrg_tlr': '', 'slr': '', 'tlr': ''}, 'hgvs_refseqgene_variant': '', 'hgvs_transcript_variant': '', 'primary_assembly_loci': {}, 'reference_sequence_records': '', 'refseqgene_context_intronic_sequence': '', 'selected_assembly': 'GRCh38', 'submitted_variant': 'NC_000008.11:g.125078802G>A', 'transcript_description': '', 'validation_warnings': ['A more recent version of the selected reference sequence NM_001330609.1 is available (NM_001330609.2): NM_001330609.2:c.203C>T MUST be fully validated prior to use in reports: select_variants=NM_001330609.2:c.203C>T', 'RefSeqGene record not available'], 'variant_exonic_positions': None}, 'validation_warning_3': {'alt_genomic_loci': [], 'annotations': {}, 'gene_ids': {}, 'gene_symbol': '', 'genome_context_intronic_sequence': '', 'hgvs_lrg_transcript_variant': '', 'hgvs_lrg_variant': '', 'hgvs_predicted_protein_consequence': {'lrg_slr': '', 'lrg_tlr': '', 'slr': '', 'tlr': ''}, 'hgvs_refseqgene_variant': '', 'hgvs_transcript_variant': '', 'primary_assembly_loci': {}, 'reference_sequence_records': '', 'refseqgene_context_intronic_sequence': '', 'selected_assembly': 'GRCh38', 'submitted_variant': 'NC_000008.11:g.125078802G>A', 'transcript_description': '', 'validation_warnings': ['A more recent version of the selected reference sequence NM_001330609.1 is available (NM_001330609.2): NM_001330609.2:c.203C>T MUST be fully validated prior to use in reports: select_variants=NM_001330609.2:c.203C>T', 'RefSeqGene record not available'], 'variant_exonic_positions': None}}

and

https://rest.variantvalidator.org/VariantValidator/variantvalidator/GRCh38/NM_015454.3:c.-2-243G>A/all?content-type=application/json

{'NM_003104.5:c.927G>A': {'alt_genomic_loci': [], 'annotations': {'chromosome': '15', 'db_xref': {'CCDS': 'CCDS10116.1', 'ensemblgene': None, 'hgnc': 'HGNC:11184', 'ncbigene': '6652', 'select': False}, 'ensembl_select': False, 'mane_plus_clinical': False, 'mane_select': False, 'map': '15q21.1', 'note': 'sorbitol dehydrogenase', 'refseq_select': False, 'variant': '1'}, 'gene_ids': {'ccds_ids': ['CCDS10116'], 'ensembl_gene_id': 'ENSG00000140263', 'entrez_gene_id': '6652', 'hgnc_id': 'HGNC:11184', 'omim_id': ['182500'], 'ucsc_id': 'uc001zul.5'}, 'gene_symbol': 'SORD', 'genome_context_intronic_sequence': '', 'hgvs_lrg_transcript_variant': '', 'hgvs_lrg_variant': '', 'hgvs_predicted_protein_consequence': {'lrg_slr': '', 'lrg_tlr': '', 'slr': 'NP_003095.2:p.(S309=)', 'tlr': 'NP_003095.2:p.(Ser309=)'}, 'hgvs_refseqgene_variant': '', 'hgvs_transcript_variant': 'NM_003104.5:c.927G>A', 'primary_assembly_loci': {'grch37': {'hgvs_genomic_description': 'NC_000015.9:g.45365581G>A', 'vcf': {'alt': 'A', 'chr': '15', 'pos': '45365581', 'ref': 'G'}}, 'grch38': {'hgvs_genomic_description': 'NC_000015.10:g.45073383G>A', 'vcf': {'alt': 'A', 'chr': '15', 'pos': '45073383', 'ref': 'G'}}, 'hg19': {'hgvs_genomic_description': 'NC_000015.9:g.45365581G>A', 'vcf': {'alt': 'A', 'chr': 'chr15', 'pos': '45365581', 'ref': 'G'}}, 'hg38': {'hgvs_genomic_description': 'NC_000015.10:g.45073383G>A', 'vcf': {'alt': 'A', 'chr': 'chr15', 'pos': '45073383', 'ref': 'G'}}}, 'reference_sequence_records': {'protein': '[https://www.ncbi.nlm.nih.gov/nuccore/NP_003095.2](https://urldefense.com/v3/__https://www.ncbi.nlm.nih.gov/nuccore/NP_003095.2__;!!DV4KuIgKKrh48VMFxQ!T6l70sxNuHgtWpxc6xsEnkXC3JVE73qa3QSQeyaKufpvFJOs6kde5KPJaKjYiMrZOKC6pH_D$)', 'transcript': '[https://www.ncbi.nlm.nih.gov/nuccore/NM_003104.5](https://urldefense.com/v3/__https://www.ncbi.nlm.nih.gov/nuccore/NM_003104.5__;!!DV4KuIgKKrh48VMFxQ!T6l70sxNuHgtWpxc6xsEnkXC3JVE73qa3QSQeyaKufpvFJOs6kde5KPJaKjYiMrZOOY6J1Sb$)'}, 'refseqgene_context_intronic_sequence': '', 'selected_assembly': 'GRCh38', 'submitted_variant': 'NC_000015.10:g.45073383G>A', 'transcript_description': 'Homo sapiens sorbitol dehydrogenase (SORD), transcript variant 1, mRNA', 'validation_warnings': ['A more recent version of the selected reference sequence NM_003104.5 is available (NM_003104.6): NM_003104.6:c.927G>A MUST be fully validated prior to use in reports: select_variants=NM_003104.6:c.927G>A', 'RefSeqGene record not available'], 'variant_exonic_positions': {'NC_000015.10': {'end_exon': '9', 'start_exon': '9'}, 'NC_000015.9': {'end_exon': '9', 'start_exon': '9'}}}, 'NM_003104.6:c.927G>A': {'alt_genomic_loci': [], 'annotations': {'chromosome': '15', 'db_xref': {'CCDS': 'CCDS10116.1', 'ensemblgene': None, 'hgnc': 'HGNC:11184', 'ncbigene': '6652', 'select': 'MANE'}, 'ensembl_select': False, 'mane_plus_clinical': False, 'mane_select': True, 'map': '15q21.1', 'note': 'sorbitol dehydrogenase', 'refseq_select': True, 'variant': '1'}, 'gene_ids': {'ccds_ids': ['CCDS10116'], 'ensembl_gene_id': 'ENSG00000140263', 'entrez_gene_id': '6652', 'hgnc_id': 'HGNC:11184', 'omim_id': ['182500'], 'ucsc_id': 'uc001zul.5'}, 'gene_symbol': 'SORD', 'genome_context_intronic_sequence': '', 'hgvs_lrg_transcript_variant': '', 'hgvs_lrg_variant': '', 'hgvs_predicted_protein_consequence': {'lrg_slr': '', 'lrg_tlr': '', 'slr': 'NP_003095.2:p.(S309=)', 'tlr': 'NP_003095.2:p.(Ser309=)'}, 'hgvs_refseqgene_variant': '', 'hgvs_transcript_variant': 'NM_003104.6:c.927G>A', 'primary_assembly_loci': {'grch37': {'hgvs_genomic_description': 'NC_000015.9:g.45365581G>A', 'vcf': {'alt': 'A', 'chr': '15', 'pos': '45365581', 'ref': 'G'}}, 'grch38': {'hgvs_genomic_description': 'NC_000015.10:g.45073383G>A', 'vcf': {'alt': 'A', 'chr': '15', 'pos': '45073383', 'ref': 'G'}}, 'hg19': {'hgvs_genomic_description': 'NC_000015.9:g.45365581G>A', 'vcf': {'alt': 'A', 'chr': 'chr15', 'pos': '45365581', 'ref': 'G'}}, 'hg38': {'hgvs_genomic_description': 'NC_000015.10:g.45073383G>A', 'vcf': {'alt': 'A', 'chr': 'chr15', 'pos': '45073383', 'ref': 'G'}}}, 'reference_sequence_records': {'protein': '[https://www.ncbi.nlm.nih.gov/nuccore/NP_003095.2](https://urldefense.com/v3/__https://www.ncbi.nlm.nih.gov/nuccore/NP_003095.2__;!!DV4KuIgKKrh48VMFxQ!T6l70sxNuHgtWpxc6xsEnkXC3JVE73qa3QSQeyaKufpvFJOs6kde5KPJaKjYiMrZOKC6pH_D$)', 'transcript': '[https://www.ncbi.nlm.nih.gov/nuccore/NM_003104.6](https://urldefense.com/v3/__https://www.ncbi.nlm.nih.gov/nuccore/NM_003104.6__;!!DV4KuIgKKrh48VMFxQ!T6l70sxNuHgtWpxc6xsEnkXC3JVE73qa3QSQeyaKufpvFJOs6kde5KPJaKjYiMrZOCaCpoHT$)'}, 'refseqgene_context_intronic_sequence': '', 'selected_assembly': 'GRCh38', 'submitted_variant': 'NC_000015.10:g.45073383G>A', 'transcript_description': 'Homo sapiens sorbitol dehydrogenase (SORD), transcript variant 1, mRNA', 'validation_warnings': ['RefSeqGene record not available'], 'variant_exonic_positions': {'NC_000015.10': {'end_exon': '9', 'start_exon': '9'}, 'NC_000015.9': {'end_exon': '9', 'start_exon': '9'}}}, 'flag': 'gene_variant', 'metadata': {'variantvalidator_hgvs_version': '2.0.1', 'variantvalidator_version': '2.0.1.dev31+g31426d1', 'vvdb_version': 'vvdb_2021_4', 'vvseqrepo_db': 'VV_SR_2021_2/master', 'vvta_version': 'vvta_2021_2'}, 'validation_warning_1': {'alt_genomic_loci': [], 'annotations': {}, 'gene_ids': {}, 'gene_symbol': '', 'genome_context_intronic_sequence': '', 'hgvs_lrg_transcript_variant': '', 'hgvs_lrg_variant': '', 'hgvs_predicted_protein_consequence': {'lrg_slr': '', 'lrg_tlr': '', 'slr': '', 'tlr': ''}, 'hgvs_refseqgene_variant': '', 'hgvs_transcript_variant': '', 'primary_assembly_loci': {}, 'reference_sequence_records': '', 'refseqgene_context_intronic_sequence': '', 'selected_assembly': 'GRCh38', 'submitted_variant': 'NC_000015.10:g.45073383G>A', 'transcript_description': '', 'validation_warnings': ['A more recent version of the selected reference sequence NM_003104.5 is available (NM_003104.6): NM_003104.6:c.927G>A MUST be fully validated prior to use in reports: select_variants=NM_003104.6:c.927G>A', 'RefSeqGene record not available'], 'variant_exonic_positions': None}, 'validation_warning_2': {'alt_genomic_loci': [], 'annotations': {}, 'gene_ids': {}, 'gene_symbol': '', 'genome_context_intronic_sequence': '', 'hgvs_lrg_transcript_variant': '', 'hgvs_lrg_variant': '', 'hgvs_predicted_protein_consequence': {'lrg_slr': '', 'lrg_tlr': '', 'slr': '', 'tlr': ''}, 'hgvs_refseqgene_variant': '', 'hgvs_transcript_variant': '', 'primary_assembly_loci': {}, 'reference_sequence_records': '', 'refseqgene_context_intronic_sequence': '', 'selected_assembly': 'GRCh38', 'submitted_variant': 'NC_000015.10:g.45073383G>A', 'transcript_description': '', 'validation_warnings': ['A more recent version of the selected reference sequence NM_003104.5 is available (NM_003104.6): NM_003104.6:c.927G>A MUST be fully validated prior to use in reports: select_variants=NM_003104.6:c.927G>A', 'RefSeqGene record not available'], 'variant_exonic_positions': None}}
Peter-J-Freeman commented 2 years ago

Thanks. I will see if I can think of a work around

So, in case 1, the input is NM_002735.3:c.253A>C but the output comes in the context of NM_001330609.1:c.203C>T and thinks the submitted variant is 'submitted_variant': 'NC_000008.11:g.125078802G>A'

So, if I capture the input variant description and it does not match the displayed submitted variant, re-validate before returning. Sounds like it's worth a try

ifokkema commented 2 years ago

Pete, is this a race condition somewhere? Are you communicating over one port with multiple processes or so?

Peter-J-Freeman commented 2 years ago

No idea @ifokkema . I have to say that the server stuff is not something I've had much chance to get to know. Thanks for the hint, this is possible. I do have multi processes and multi threading running. Think there might be 1 port. Can I send you the configs??

vidboda commented 2 years ago

Hi,

a new one just occured:

https://rest.variantvalidator.org/VariantValidator/variantvalidator/GRCh38/NM_001304808.3:c.2066C>T/all?content-type=application/json

{'NM_000548.5:c.4422_4423del': {'alt_genomic_loci': [], 'annotations': {'chromosome': '16', 'db_xref': {'CCDS': 'CCDS10458.1', 'ensemblgene': None, 'hgnc': 'HGNC:12363', 'ncbigene': '7249', 'select': 'MANE'}, 'ensembl_select': False, 'mane_plus_clinical': False, 'mane_select': True, 'map': '16p13.3', 'note': 'TSC complex subunit 2', 'refseq_select': True, 'variant': '1'}, 'gene_ids': {'ccds_ids': ['CCDS10458', 'CCDS45384', 'CCDS81934', 'CCDS81933', 'CCDS81932', 'CCDS58408', 'CCDS10459'], 'ensembl_gene_id': 'ENSG00000103197', 'entrez_gene_id': '7249', 'hgnc_id': 'HGNC:12363', 'omim_id': ['191092'], 'ucsc_id': 'uc002con.4'}, 'gene_symbol': 'TSC2', 'genome_context_intronic_sequence': '', 'hgvs_lrg_transcript_variant': '', 'hgvs_lrg_variant': '', 'hgvs_predicted_protein_consequence': {'lrg_slr': 'LRG_487p1:p.(R1474Sfs*49)', 'lrg_tlr': 'LRG_487p1:p.(Arg1474SerfsTer49)', 'slr': 'NP_000539.2:p.(R1474Sfs*49)', 'tlr': 'NP_000539.2:p.(Arg1474SerfsTer49)'}, 'hgvs_refseqgene_variant': '', 'hgvs_transcript_variant': 'NM_000548.5:c.4422_4423del', 'primary_assembly_loci': {'grch37': {'hgvs_genomic_description': 'NC_000016.9:g.2134645_2134646del', 'vcf': {'alt': 'A', 'chr': '16', 'pos': '2134640', 'ref': 'AAG'}}, 'grch38': {'hgvs_genomic_description': 'NC_000016.10:g.2084644_2084645del', 'vcf': {'alt': 'A', 'chr': '16', 'pos': '2084639', 'ref': 'AAG'}}, 'hg19': {'hgvs_genomic_description': 'NC_000016.9:g.2134645_2134646del', 'vcf': {'alt': 'A', 'chr': 'chr16', 'pos': '2134640', 'ref': 'AAG'}}, 'hg38': {'hgvs_genomic_description': 'NC_000016.10:g.2084644_2084645del', 'vcf': {'alt': 'A', 'chr': 'chr16', 'pos': '2084639', 'ref': 'AAG'}}}, 'reference_sequence_records': {'protein': '[https://www.ncbi.nlm.nih.gov/nuccore/NP_000539.2](https://urldefense.com/v3/__https://www.ncbi.nlm.nih.gov/nuccore/NP_000539.2__;!!DV4KuIgKKrh48VMFxQ!STwBu38DR6QqG28dkkRR37qPe48SBChqrTHCdK_M5I8iISAfqmP1zMAN-tY434XJf4wIctM_$)', 'transcript': '[https://www.ncbi.nlm.nih.gov/nuccore/NM_000548.5](https://urldefense.com/v3/__https://www.ncbi.nlm.nih.gov/nuccore/NM_000548.5__;!!DV4KuIgKKrh48VMFxQ!STwBu38DR6QqG28dkkRR37qPe48SBChqrTHCdK_M5I8iISAfqmP1zMAN-tY434XJf5oD8cYk$)'}, 'refseqgene_context_intronic_sequence': '', 'selected_assembly': 'GRCh38', 'submitted_variant': 'NM_000548.5:c.4422_4423del', 'transcript_description': 'Homo sapiens TSC complex subunit 2 (TSC2), transcript variant 1, mRNA', 'validation_warnings': ['RefSeqGene record not available'], 'variant_exonic_positions': {'NC_000016.10': {'end_exon': '34', 'start_exon': '34'}, 'NC_000016.9': {'end_exon': '34', 'start_exon': '34'}}}, 'flag': 'gene_variant', 'metadata': {'variantvalidator_hgvs_version': '2.0.1', 'variantvalidator_version': '2.0.1.dev41+g9de1da3', 'vvdb_version': 'vvdb_2021_4', 'vvseqrepo_db': 'VV_SR_2021_2/master', 'vvta_version': 'vvta_2021_2'}}
ifokkema commented 2 years ago

Thanks for the hint, this is possible. I do have multi processes and multi threading running. Think there might be 1 port. Can I send you the configs??

I think I'd need more than just the configs to understand the whole structure of the system. I know next to nothing about the internal works of VV, and I can't say if this is "server stuff" or "VV stuff". The thing to look for is shared connections or ports; so, e.g., how the API connects to the main VV process that does the analysis. If files are created and then read, that's always a weak spot. Or multiple processes talking to one program that doesn't have a way to know what process it's replying to. Since even the submitted_variant key is wrong, whatever part of the code that keeps this variable in memory while other stuff is calculated is already behind some connection that gets mixed up. So the problem is between the high-level API request handling and the part where submitted_variant gets stored before everything else gets handled.

I would start investigating your server logs. If the submitted_variant from David's output and David's actual submitted variant are found at the same time in the server logs, you know it's a race condition. If the server really isn't so busy and there's time in between those two requests, it might also be some kind of caching. But then I don't know if you have implemented any caching at all or not.

vidboda commented 2 years ago

Hi both,

maybe related, here is another type of weird results (empty):

https://rest.variantvalidator.org/VariantValidator/variantvalidator/GRCh38/NM_018188.4:c.895-4G>A/all?content-type=application/json

{'flag': 'warning', 'metadata': {'variantvalidator_hgvs_version': '2.0.1', 'variantvalidator_version': '2.0.1.dev50+g4572981', 'vvdb_version': 'vvdb_2021_4', 'vvseqrepo_db': 'VV_SR_2021_2/master', 'vvta_version': 'vvta_2021_2'}, 'validation_warning_1': {'alt_genomic_loci': [], 'annotations': {}, 'gene_ids': {}, 'gene_symbol': '', 'genome_context_intronic_sequence': '', 'hgvs_lrg_transcript_variant': '', 'hgvs_lrg_variant': '', 'hgvs_predicted_protein_consequence': {'lrg_slr': '', 'lrg_tlr': '', 'slr': '', 'tlr': ''}, 'hgvs_refseqgene_variant': '', 'hgvs_transcript_variant': '', 'primary_assembly_loci': {}, 'reference_sequence_records': '', 'refseqgene_context_intronic_sequence': '', 'selected_assembly': 'GRCh38', 'submitted_variant': 'NM_001010867.3:c.*19G>A', 'transcript_description': '', 'validation_warnings': [], 'variant_exonic_positions': None}}
i3hsInnovation commented 2 years ago

@beboche @ifokkema . I have tweaked the server. The main thing I'm trying is using the worker-event MPM in apache.

Please let me know if you see any more of these events @beboche

vidboda commented 2 years ago

Hi both, nice! I'll sure let you know as they occur on a daily basis. cheers

ifokkema commented 2 years ago

Hi Pete, right now I'm getting lots of HTTP500s when using LRGs. Is this related? The same call with NCs work just fine.

i3hsInnovation commented 2 years ago

Might be that the databases are out of date. I'm making an update as we speak. Let me test locally and feed back. Will look at what is causing the error and send a better error message back and fix the error if the LRG is in the database, or when it is added in the next few days

i3hsInnovation commented 2 years ago

FYI, that means I think this is an unrelated issue, but thanks for the heads up. What I'm looking for here is instances where you ask about variant A but get info about a different variant (B)

ifokkema commented 2 years ago

FYI, that means I think this is an unrelated issue, but thanks for the heads up. What I'm looking for here is instances where you ask about variant A but get info about a different variant (B)

Sure, but I was thinking that maybe it's caused by the updating of the server or so?

Now, the HTTP500 has disappeared and I'm getting a "LRG_199:g.1000del is an unsupported format: For assistance, submit variant description to https://rest.variantvalidator.org/". Should I create a new issue?

Peter-J-Freeman commented 2 years ago

Yes please. New issue for that one. It'll be a bug I need to kill quick

Peter-J-Freeman commented 2 years ago

Please provide some examples :)

vidboda commented 2 years ago

The main thing I'm trying is using the worker-event MPM in apache.

You used to use pre-fork? I'm currently also testing the event MPM on my side.

Peter-J-Freeman commented 2 years ago

Just started using the event MPM https://httpd.apache.org/docs/2.4/mod/event.html. So far it seems to have potentially sped things up and at least reduced if not stopped mixed up results. I was on Worker MPM before

Peter-J-Freeman commented 2 years ago

VV is handling the LRG_199 variant

{
        "alt_genomic_loci": [],
        "annotations": {},
        "gene_ids": {},
        "gene_symbol": "",
        "genome_context_intronic_sequence": "",
        "hgvs_lrg_transcript_variant": "",
        "hgvs_lrg_variant": "",
        "hgvs_predicted_protein_consequence": {
            "lrg_slr": "",
            "lrg_tlr": "",
            "slr": "",
            "tlr": ""
        },
        "hgvs_refseqgene_variant": "",
        "hgvs_transcript_variant": "",
        "primary_assembly_loci": {},
        "reference_sequence_records": "",
        "refseqgene_context_intronic_sequence": "",
        "selected_assembly": "GRCh37",
        "submitted_variant": "LRG_199:g.1000del",
        "transcript_description": "",
        "validation_warnings": [
            "LRG_199:g.1000del automapped to equivalent RefSeq record NG_012232.1:g.1000del",
            "NG_012232.1:g.1000delT automapped to genome position NC_000023.10:g.33361728delA",
            "Removing redundant reference bases from variant description",
            "No transcripts found that fully overlap the described variation in the genomic sequence"
        ],
        "variant_exonic_positions": null
    },
    "flag": "intergenic",
    "intergenic_variant_1": {
        "alt_genomic_loci": [],
        "annotations": {},
        "gene_ids": {},
        "gene_symbol": "",
        "genome_context_intronic_sequence": "",
        "hgvs_lrg_transcript_variant": "",
        "hgvs_lrg_variant": "LRG_199:g.1001del",
        "hgvs_predicted_protein_consequence": {
            "lrg_slr": "",
            "lrg_tlr": "",
            "slr": "",
            "tlr": ""
        },
        "hgvs_refseqgene_variant": "NG_012232.1:g.1001del",
        "hgvs_transcript_variant": "",
        "primary_assembly_loci": {
            "grch37": {
                "hgvs_genomic_description": "NC_000023.10:g.33361728del",
                "vcf": {
                    "alt": "G",
                    "chr": "X",
                    "pos": "33361725",
                    "ref": "GA"
                }
            },
            "grch38": {
                "hgvs_genomic_description": "NC_000023.11:g.33343611del",
                "vcf": {
                    "alt": "G",
                    "chr": "X",
                    "pos": "33343608",
                    "ref": "GA"
                }
            },
            "hg19": {
                "hgvs_genomic_description": "NC_000023.10:g.33361728del",
                "vcf": {
                    "alt": "G",
                    "chr": "chrX",
                    "pos": "33361725",
                    "ref": "GA"
                }
            },
            "hg38": {
                "hgvs_genomic_description": "NC_000023.11:g.33343611del",
                "vcf": {
                    "alt": "G",
                    "chr": "chrX",
                    "pos": "33343608",
                    "ref": "GA"
                }
            }
        },
        "reference_sequence_records": {
            "lrg": "http://ftp.ebi.ac.uk/pub/databases/lrgex/LRG_199.xml",
            "refseqgene": "https://www.ncbi.nlm.nih.gov/nuccore/NG_012232.1"
        },
        "refseqgene_context_intronic_sequence": "",
        "selected_assembly": "GRCh37",
        "submitted_variant": "LRG_199:g.1000del",
        "transcript_description": "",
        "validation_warnings": [
            "LRG_199:g.1000del automapped to equivalent RefSeq record NG_012232.1:g.1000del",
            "NG_012232.1:g.1000delT automapped to genome position NC_000023.10:g.33361728delA",
            "Removing redundant reference bases from variant description",
            "No transcripts found that fully overlap the described variation in the genomic sequence"
        ],
        "variant_exonic_positions": null
    },
    "metadata": {
        "variantvalidator_hgvs_version": "2.0.1.dev2+g58fc52a",
        "variantvalidator_version": "1.0.5.dev228+gee3fee4.d20211116",
        "vvdb_version": "vvdb_2022_4",
        "vvseqrepo_db": "VV_SR_2022_02/master",
        "vvta_version": "vvta_2022_02"
    }
}
Peter-J-Freeman commented 2 years ago

Now, the HTTP500 has disappeared and I'm getting a "LRG_199:g.1000del is an unsupported format: For assistance, submit variant description to https://rest.variantvalidator.org/". Should I create a new issue?

@ifokkema , which endpoint is this? The LOVD endpoint does not handle gene sequence variants. This is currently a VV request, e.g. LRG and RefSeqGene go to VV endpoint

vidboda commented 2 years ago

so far, the apache trick seems to be doing the job. Please keep the issue opened for about 10 days from now and I'll let you know if everything's right

Peter-J-Freeman commented 2 years ago

It may have fixed a few issues. Happy to discuss it with you too. Event MPM might be useful for your web sites too. I'm trialing it in the API and live interactive site

vidboda commented 2 years ago

right I switched two days ago after some testings to event MPM in the live MobiDetails and API, and so far so good. What else did you chose? Could you send me your event mpm config by email, so that I can compare with mine?

Peter-J-Freeman commented 2 years ago

I'm using Python with mod_wsgi Here is my config (with full paths removed)

# Please see https://wiki.lamp.le.ac.uk/lampdoc/index.php/Python for more details
LoadModule wsgi_module /<path>mod_wsgi-py36.cpython-36m-x86_64-linux-gnu.so

WSGIProcessGroup vvweb
WSGIApplicationGroup %{GLOBAL}
WSGIDaemonProcess vvweb python-path=/path/rest_variantValidator:/path/envs/vvweb:/path/site-packages processes=4 threads=40
WSGIPythonHome /path/vvweb
WSGIScriptAlias / /path/wsgi.py
WSGISocketPrefix /path/run
WSGIPassAuthorization On

# VVrest settings and configs
<Directory "<path>">
  Require all granted
</Directory>

<Directory "<path>">
    <Files wsgi.py>
        Require all granted
    </Files>
    <Files rest_VariantValidator.log>
        Require all granted
    </Files> 
</Directory>

<Directory "<path>">
    <Files .vv_errorlog>
        Require all granted
    </Files>
</Directory>

I used Locust to hammer the server and switching from MPM to event dropped the failure rate (strict time set to complete jobs, with some jobs known to be long) from ~20% to >>10%.

vidboda commented 2 years ago

ok for wsgi, thanks. what drove you to this choice? processes=4 threads=40 - maybe we should discuss this by email, including Ivo. Didn't you also configure event MPM, which will handle all the static files, e.g. with ServerLimit 10 StartServers 2 MaxClients 100 MinSpareThreads 25 MaxSpareThreads 75 ThreadsPerChild 20 (copied from here)

ifokkema commented 2 years ago

Now, the HTTP500 has disappeared and I'm getting a "LRG_199:g.1000del is an unsupported format: For assistance, submit variant description to https://rest.variantvalidator.org/". Should I create a new issue?

@ifokkema , which endpoint is this? The LOVD endpoint does not handle gene sequence variants. This is currently a VV request, e.g. LRG and RefSeqGene go to VV endpoint

That explains things, thanks!

Peter-J-Freeman commented 2 years ago

It could potentially be upgraded to use LRG and RefSeq Gene. Can consider building it into a future release if it's needed

ifokkema commented 2 years ago

Nah, don't worry! We will rarely use it. It's just something we should know considering our HGVS syntax validator. If users submit LRGs or NGs and they want full validation of their variants, then we ought to use the VV endpoint and not the LOVD endpoint. No biggie!