Open cassimons opened 2 years ago
I just checked the contents of the Elasticsearch index for this variant:
GET /validation-genome-2022_0810_2358_474tt/_search
{"query": {"bool": {"filter": [{"term": {"variantId": "16-70252728-T-A"}}]}}}
The sortedTranscriptConsequences
do indeed contain the new gene symbols (AARS1
):
"sortedTranscriptConsequences" : [
{
"biotype" : "protein_coding",
"canonical" : 1,
"cdna_start" : 3007,
"cdna_end" : 3007,
"codons" : "aAg/aTg",
"gene_id" : "ENSG00000090861",
"gene_symbol" : "AARS1",
"hgvsc" : "ENST00000261772.13:c.2900A>T",
"hgvsp" : "ENSP00000261772.8:p.Lys967Met",
"transcript_id" : "ENST00000261772",
"amino_acids" : "K/M",
"lof" : null,
"lof_filter" : null,
"lof_flags" : null,
"lof_info" : null,
"polyphen_prediction" : "possibly_damaging",
"protein_id" : "ENSP00000261772",
"protein_start" : 967,
"sift_prediction" : "deleterious_low_confidence",
"consequence_terms" : [
"missense_variant"
],
"domains" : null,
"major_consequence" : "missense_variant",
"category" : "missense",
"hgvs" : "p.Lys967Met",
"major_consequence_rank" : 11,
"transcript_rank" : 0
},
...
Similarly, the "mainTranscript_gene_symbol" : "AARS1"
also looks good.
So maybe it's indeed coming from the Postgres table.
Maybe we need to run update_gencode.py
? (Note that currently the version is limited to 32 -- not sure why.)
It gets called for a list of versions in update_all_reference_data.py
.
@illusional Not sure how adventurous you're feeling, but you could try increasing that limit and adding Gencode 39 to that list above and run ./manage.py update_all_reference_data --use-cached-omim
?
My guess (like yours) would be that the limit is to tie the gencode version to relevant the vep version? If so then Gencode 39 is what we want if we are still on VEP 105. It would be great if we can give this a go.
Hey @cassimons, can you confirm that this gene symbol has been updated in seqr-staging:validation?
I can't search for AARS
anymore, but can for AARS1
. If you're happy with this, I can push to seqr-prod.
Thanks @illusional! Yes this seems to be working as expected to me. Go for Prod 🚀
CPG seqr currently displays gene symbols that are several years out of date.
Good examples are any of the *ARS genes, eg OLD>NEW: AARS > AARS1, YARS > YARS1
Taking the AARS1 example, ensembl 105 uses the modern version of the symbol
AARS1
while ensembl 95 uses the older formAARS
CPG seqr currently displays the old version
AARS
and does not allow the new version in gene lists:To my knowledge, several months ago we updated the seqr loading pipeline to use an up-to-date version of VEP (>= 105). Am I misunderstanding what/when we updated VEP, or are the gene symbols being sourced from a different place that we have failed to update (eg is this from the gene/transcript info tables in Postgres)?