huridocs / uwazi

Uwazi is a web-based, open-source solution for building and sharing document collections
http://www.uwazi.io
MIT License
235 stars 80 forks source link

Search tips don't explain exact term search limitations adequately #4936

Open llfinch opened 2 years ago

llfinch commented 2 years ago

Describe the bug Currently, the Search Tips pop-out window in the Uwazi Library says the following about exact term search:

What is left out in this short explanation is the fact that when there is only one word contained within the quotation marks, the search behaves differently than one might expect for an exact term match (and it certainly behaves differently than what happens when there are two or more words contain between quotation marks): It appears that when there is only one word contained within quotation marks, it behaves as if the quotation marks weren't there and so it returns exact matches (if there are any) but also nearby matches, e.g. a search of "provide" returns provide but also returns provider, provides, provided, etc.

We need to update the Search Tips to adequately explain this to avoid user confusion (which has been reported around this issue). I suggest the following:

As well, it would be good to take this opportunity to correct the tips which incorrectly use "i.e." (it should be e.g., which means "for example" and is meant to introduce examples, not i.e., which means "in essence" or "in other words"; the two are often confused), so replace all instances of "i.e." with "e.g."

(We'll be updating the user documentation to adequately explain how all this works, too.)

Screenshots Search tips as they currently appear:

image

RafaPolit commented 2 years ago

It may be worth a quick look into a possible option to actually "enforce" the quotes on ES so that quotes don't actually try to fuzzyfind the term? Not sure if this is what we want, but at least knowing it is an option could be interesting.

cc @fnocetti @LaszloKecskes @txau

llfinch commented 2 years ago

If that's an option, I would 110% agree that we should look into it, and I would push for us to adopt it because it would be added value to how Uwazi works currently: a way to do clean exact term search for one-word searches.

txau commented 2 years ago

It may be worth a quick look into a possible option to actually "enforce" the quotes on ES so that quotes don't actually try to fuzzyfind the term? Not sure if this is what we want, but at least knowing it is an option could be interesting.

cc @fnocetti @LaszloKecskes @txau

Taking a look to the documentation. I guess the behavior of search needs to be controlled at the application level. It is up to us to decide wether to fuzzy search or not, or what causes it to be triggered.

The solution would be a search string parser that ie. Doesn't use the fuzzy search if it detects the quotes. Another option is to never do fuzzy searches, but we offer the users a link or button to expand the search including fuzzy results.

llfinch commented 1 year ago

Hi all, so I was under the impression that when doing a one-word exact search, although it does include fuzzy matches in the results, that the search results with exact matches would be frontloaded in the results. However, one of our partners has flagged that the search isn't working like this. When they search the term "género" in quotation marks, 80 entities are returned and scattered throughout them are results for the exact term "género", but they are mixed up with the fuzzy results. (If you want to explore this specific example yourself, message me and I'll point you to the instance.)