monarch-initiative / monarch-legacy

Monarch web application and API
BSD 3-Clause "New" or "Revised" License
42 stars 37 forks source link

Better handling of hyphens, punctuation in search #1282

Closed jmcmurry closed 6 years ago

jmcmurry commented 8 years ago

"Pallister-Hall Syndrome" is not returned at all when searching for "Pallister Hall Syndrome" (no hyphen)

Same for "Smith-Magenis Syndrome". Moreover, in this case we have two records with identical labels but wildly different data: OMIM and Decipher.

The hyphen issue is important to fix, because searches of many long or complicated disease names might be done by way of "paste-and-search" rather than type via autocomplete.

jmcmurry commented 8 years ago

another use case to test is searching for things with dots. Below pasted from a duplicate ticket.

Searching on any string with a dot in monarch returns an error: 15q13.3 homozygous microdeletion syndrome

{"statusCode":500,"error":"Internal Server Error","message":"An internal server error occurred"}

Using OSX Chrome

jmcmurry commented 8 years ago

I would like to flag this as the top priority to fix once solr is upgraded. cc: @kltm @jnguyenx @cborromeo @harryhoch

jmcmurry commented 8 years ago

@jnguyenx The hyphen issue has been solved but on beta, `15q13.3 homozygous microdeletion syndromeis failing andTg(SMN2)2Hung`` doesn't return the relevant result. Thoughts?

jnguyenx commented 8 years ago

There was an existing bug with searches that contain dots, I've just fixed that.

Concerning the other issue, I think that it's just a data source issue. MGI:3514027 nor MGI:3056903 are on the ontology graph.

jmcmurry commented 8 years ago

@jnguyenx Regarding punctuation, on beta now, the autocomplete works more or less*, but the site search does not. For instance, https://beta.monarchinitiative.org/search/ncbigene:6622 redirects to https://beta.monarchinitiative.org/false/ncbigene:6622

underscores get you at least to a non-error, but don't include the right results https://beta.monarchinitiative.org/search/ncbigene_6622

What I think makes the most sense is to treat all punctuation in the query as if it were a space. Is this a terrible idea?

https://beta.monarchinitiative.org/search/ncbigene%206622

screen shot 2016-10-13 at 3 56 17 pm
jnguyenx commented 8 years ago

The first use-case is a redirect derived from SciGraph. I left it but was tempted to remove it. It is case sensitive so you have to search for NCBIGene:6622, which is quite whacky. Since nobody reported that before, I guess that nobody uses the search box this way, I'll remove that.

Concerning the , I thought that it was taken care by solr's StandardTokenizer. I added a pattern to process .

On Thu, Oct 13, 2016 at 12:52 PM, Julie McMurry notifications@github.com wrote:

Regarding punctuation, on beta now, the autocomplete works, but the site search does not. For instance, https://beta.monarchinitiative.org/search/ncbigene:6622 redirects to https://beta.monarchinitiative.org/false/ncbigene:6622

underscores get you at least to a non-error, but don't include the right results https://beta.monarchinitiative.org/search/ncbigene_6622

What I think makes the most sense is to treat all punctuation in the query as if it were a space. Is this a terrible idea?

https://beta.monarchinitiative.org/search/ncbigene%206622

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/monarch-initiative/monarch-app/issues/1282#issuecomment-253620332, or mute the thread https://github.com/notifications/unsubscribe-auth/AEHMGAEL1tSBm1T3dy6AWrsBLMw7Pg0Rks5qzowIgaJpZM4I2_7z .

kshefchek commented 8 years ago

@jmcmurry is there a reason we want to autocomplete on fragments and full iris? In my opinion we should just enable autocompletion on curies.

kltm commented 8 years ago

Hm, I'd think that IRIs and and the like are more for identifiers rather than anything that should be targeted for autocomplete, etc--curies are what most users would try to be searching on. (Note, that if one tokenized on ":" or "/" you'd still get useful results.) It's not something that has ever come up on the GO side of things. @cmungall any thoughts.

jmcmurry commented 8 years ago

Auto complete need not support any identifiers searches however site search should

On Monday, October 17, 2016, kltm notifications@github.com wrote:

Hm, I'd think that IRIs and and the like are more for identifiers rather than anything that should be targeted for autocomplete, etc--curies are what most users would try to be searching on. (Note, that if one tokenized on ":" or "/" you'd still get useful results.) It's not something that has ever come up on the GO side of things. @cmungall https://github.com/cmungall any thoughts.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/monarch-initiative/monarch-app/issues/1282#issuecomment-254354401, or mute the thread https://github.com/notifications/unsubscribe-auth/ADfUb0kcsmd72oPFiyROHi1slOQttLCpks5q0_fCgaJpZM4I2_7z .


cmungall commented 8 years ago

I agree with @kltm. I personally find it useful to paste in an ID and see what comes up. Many AmiGO users paste in IDs or ID fragments and get upset if nothing comes up. Seems no harm in adding

On 17 Oct 2016, at 15:33, kltm wrote:

Hm, I'd think that IRIs and and the like are more for identifiers rather than anything that should be targeted for autocomplete, etc--curies are what most users would try to be searching on. (Note, that if one tokenized on ":" or "/" you'd still get useful results.) It's not something that has ever come up on the GO side of things. @cmungall any thoughts.

You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub: https://github.com/monarch-initiative/monarch-app/issues/1282#issuecomment-254354401

kshefchek commented 6 years ago

I think this is fixed now? https://monarchinitiative.org/search/ncbigene:6622 https://monarchinitiative.org/search/ncbigene_6622