statgen / bravo_api

Server side data processing and retrieval endpoints for BRAVO
MIT License
1 stars 2 forks source link

Add `other_names` functionality to `variants.get_genes` #4

Open grosscol opened 3 years ago

grosscol commented 3 years ago

Issue or current state

Discovered this comment regarding adding more sorting options for the mongo aggregate pipeline of get_genes:

TODO: add other_names (need to use aggregae https://stackoverflow.com/questions/28889240/mongodb-sort-documents-by-array-elements

From the context of the Stack Overflow post, it appears that this comment is about sorting on a field that is not part of the match.

Resolved when

pjvandehaar commented 3 years ago

"FURIN" used to be called "PCSK3". If you search Bravo for "PCSK3", you get nothing. .other_names should be used more like this:

image
grosscol commented 3 years ago

@pjvandehaar Thanks for the illustration. That makes sense.

Per @dtaliun

Each gene has a unique identifier (so called Ensemble ID) which starts with “ENSG” and is stored in the gene_id field. Also, a gene has a name (e.g. “PCSK9"), which is stored in the gene_name field. Many genes also have so called “synonyms” or “aliases” (names which were used previously), which are stored in the other_names field (a list of all other names). For example, PCSK9 has a synonym “NARC1”. Currently, the search of variants by gene is done using only gene_id or gene_name fields, but not by other_names. So, if somebody will use “NARC1", no results will be returned.

The intended functionality was to also search through the other_other names field in addition to gene_name and gene_id