iobio / gene.iobio

Gene.iobio vue
MIT License
55 stars 11 forks source link

Order gene list on similarity to entered term (not alphabetically) #1074

Open AlistairNWard opened 3 weeks ago

AlistairNWard commented 3 weeks ago

Search for the gene F5 and there are so many genes returned that contain F5, that the dropdown is cutoff well before genes beginning with "F" are displayed. This means it's impossible to search for F5 and select it.

Instead of ordering genes in the search dropdown alphabetically, they should be ordered according to their similarity to the entered term. As an exact match, F5 (or genes with a synonym of exactly F5) would be then be the first in list when F5 is entered.

tonydisera commented 1 week ago

Agreed. I have reworked the gene search typeahead functionality in release 4.11. This new behavior will try to match on the gene symbol first. If there are no matches, it will search the gene aliases. In your example, the current behavior results in this long list of 'hits':

Screenshot 2024-06-17 at 5 19 00 PM

In gene.iobio 4.11, the search on F5 will return only that gene:

Screenshot 2024-06-17 at 5 18 54 PM

Here is a more complicated example. If we type in 'MA', we get a long list of genes starting with 'MA'. Type in 'MAY' and we only get one hit, and it is for the gene alias 'MAYA'. In other words, not gene names starting with 'MAY' were found in the database based on the names populated from RefSeq and Gencode, but there is an alias starting with 'MAY' that points to the GenCode gene MNX1-AS1.

Screenshot 2024-06-17 at 5 24 06 PM
tonydisera commented 1 week ago

@AlistairNWard, you bring up an interesting point about the order of the gene names returned. In the new release, the search looks for a match based on the beginning of the gene name. Hopefully, this isn't too restrictive. If we returned all genes that match the term anywhere in the gene name, then the order of the genes returned is more relevant. And I agree with you. The user would want to see the 'closest' matches first. And the new behavior does satisfy this. For example, if the user searches on gene TAT, that exact match appears first in the list:

Screenshot 2024-06-17 at 5 39 44 PM

And on a related note, gene list order is dictated by the gene name, not the gene alias. So, for example, if the user enters MGC445, there are not any genes with this name, but there are gene aliases that start with this term. Notice that the genes are ordered alphabetically by the gene name (the name designated by RefSeq or Gencode).

Screenshot 2024-06-17 at 5 43 45 PM

There are many nuances to the gene search, so please feel free @AlistairNWard to play around with the new functionality on https://stage.gene.iobio.io. Overall, I'm happy with the new behavior, but my guess is that it may still need some refinement. Hopefully, this gets us closer to a solid gene search.