monarch-initiative / monarch-ui

The previous version of the Monarch Initiative website
https://previous.monarchinitiative.org/
BSD 3-Clause "New" or "Revised" License
17 stars 28 forks source link

replace 'Multicystic kidney dysplasia' as an autocomplete example #196

Closed pnrobinson closed 3 years ago

pnrobinson commented 5 years ago

I find the autocomplete for Multicystic kidney dysplasia to be a little confusing. Too many different diseases get shown.

kshefchek commented 5 years ago

Also related: https://github.com/monarch-initiative/monarch-app/issues/1534 @kltm is our solr guru

kshefchek commented 5 years ago

Reviewing my solr notes - the query "Multicystic kidney dysplasia" searches our documents for four tokens:

  1. Multicystic
  2. kidney
  3. dysplasia
  4. Multicystic kidney dysplasia

This typically results in the correct hit being ranked highly, but we also end up with many false positives. We see this especially for diseases with syndrome in the name or other common words.

We can experiment with settings, but I think @jmcmurry and Jeremy did a lot of testing and decided on the current configuration.

pnrobinson commented 5 years ago

I would say that if we have an exact match with a phrase like "Multicystic kidney dysplasia" then we should only display that (and exact synonyms). It is different if the user is in the process of entering a phrase. It basically just feels like a mistake to me -- can we reconsider or re-discuss this?

jmcmurry commented 5 years ago

I'm inclined to agree that in this edge case with such a specific three token query, just an exact match could be returned (with the tokens in any order, but all exactly matched), perhaps with a button to see other results? I don't feel super strongly

kshefchek commented 5 years ago

is the confusion that theres a phenotype and disease with nearly the same label? Or that theres too many results that are unrelated (or both).

monicacecilia commented 5 years ago

@pnrobinson - there is a question for you in this ticket. Ping!

pnrobinson commented 5 years ago

The list of suggestions is weird. Whether or not we want to fix this feature right now, "Multicystic kidney dysplasia" should not be shown as one of our examples on the landing page. "Noonan syndrome" works a lot better, for instance.

kshefchek commented 5 years ago

A potential solution here is to only search on the entire string when a user limits a query to phenotypes, and potentially other categories. This would avoid the matches where only "kidney" has matched.

Rereading https://github.com/monarch-initiative/monarch-app/issues/1383, I assume the reason we do this is to support queries like "{taxon} {gene_symbol}" for example Human SHH. Sending "Human" and "SHH" as distinct tokens allows us to match different fields in the solr doc, in this case the primary label and the taxon_label fields.

I would say that if we have an exact match with a phrase like "Multicystic kidney dysplasia" then we should only display that (and exact synonyms).

I think this could get tricky since the solr score is calculated on an entire document, rather than specific fields.

kshefchek commented 5 years ago

The minimum should match parameter is another option here: https://lucene.apache.org/solr/guide/6_6/the-dismax-query-parser.html#TheDisMaxQueryParser-Themm_MinimumShouldMatch_Parameter

kshefchek commented 5 years ago

Testing this out on https://kshefchek.github.io/monarch-ui/, even when requiring 2 out of 3 in [Multicystic, kidney, dysplasia] we still get these extra hits, I think the best approach is to use a different example

kshefchek commented 3 years ago

@pnrobinson do you have an idea for a better autocomplete example, or is 'Multicystic kidney dysplasia' working well enough?

pnrobinson commented 3 years ago

@kshefchek My main wish would be that the site shows at least the children of the disease terms. E.g., https://monarchinitiative.org/disease/MONDO:0015231 Bartter syndrome should show the fact that there are several forms of this disease. I guess this does not make sense if there are "too many" children though...

kshefchek commented 3 years ago

it does! but it's hidden in neighbors (under overview in the vertical nav bar), but we should make this a new ticket or discussion (or use this one)

For this ticket, do the results for the autocomplete example 'Multicystic kidney dysplasia' look better?

pnrobinson commented 3 years ago

I see what you mean. Neighbors is not an obvious way to get there! Multicystic kidney dysplasia pulls in stuff that is relatively far off, but also not bad/wrong, and so I do not think we need to change, as long as we are ok with having a broad search (which works for me).

kshefchek commented 3 years ago

perfect thanks!