sciencehistory / chf-sufia

sufia-based hydra app
Other
9 stars 4 forks source link

Search Results for multi-word queries includes hits that don't have all words #841

Closed MDiMeo closed 6 years ago

MDiMeo commented 7 years ago

When I search for "women in science", I get hits from items like this John Lawes one that don't have "women" anywhere in the record. Searching for "women science" brings up different results that appear more accurate, and the John Lawes portrait disappears. The problem is more obvious when you check the "public domain" box, but it is unrelated to this. We need to investigate the problem. One possible cause is that the search may allow results where 2/3 of the words match.

jrochkind commented 7 years ago

That's not a "phrase" exactly, a phrase is when you actually put your query in quotes, like you search for "women in science", you'll get different results -- where all those words need to be there, in that order, just like that. The user can use double quotes in their query, just like google, to do that.

For the search you entered, what is your assumption about what you expect from results? All words must be present? Were you thinking they would all have to be in order as a phrase too?

MDiMeo commented 7 years ago

Feel free to reword the title to something that makes more sense to you if "phrase" isn't helpful. "Women in science" is something users typed into the search box and were confused that many of the results did not have women. I know that "women in science" is a subject term, so when I searched for it I assumed that all of the records that have that subject would come up (and I didn't check if they did or not). By not using quotes around those three words when I searched, I assumed it would search for all three words anywhere in the record but not necessarily ordered as a phrase. My thought was that I'd capture all those objects with "women in science" as a subject, but also anything that had those three words anywhere in the record.

jrochkind commented 6 years ago

@MDiMeo just to make sure I understand, you want all words entered in query to be 'required', hits only included if they contain all words?

I can do that. The sufia default configuration just allows only some words to be present, while still ranking hits higher with all words, which is more google-like behavior. But I can easily change it to require all.

MDiMeo commented 6 years ago

When a search engine drops a word from a multi-word query, I'm used to seeing it say something like "showing results for x and y (not including z)". I'm pretty sure that's what Google does. If I know what it's doing, then I'm not confused. What happened here, and in the user testing for this task, was that users didn't understand what came up and why because it said: "You searched for: xyz", with all three words in one box, but the results didn't seem to have all three words in the record.

Let's discuss. This hasn't come up again, so maybe it's worth revisiting whether this is still a priority.

jrochkind commented 6 years ago

it'll only take 10 minutes to do, it's not an expensive change.

I don't believe that feature "showing results for x and y (not including z)" is available though.

MDiMeo commented 6 years ago

Yeah, I think the issue is that the UI doesn't represent the action it's actually doing. If we can't do the "x and y (not including z)" option in the UI, then I think what you propose - to require it to search all terms included in the phrase - would be best.