lucaong / minisearch

Tiny and powerful JavaScript full-text search engine for browser and Node
https://lucaong.github.io/minisearch/
MIT License
4.81k stars 137 forks source link

making autosuggest results useful #157

Closed joyously closed 2 years ago

joyously commented 2 years ago

I'm struggling to get search suggestions that are useful. It makes sense to me for the search to use AND, so I have set that for the search and tried both OR and AND for the suggestions. It seems to work okay in the demo, but my data is not simple title and artist like the demo. It is long articles of text.

I've tried various combinations of prefix and fuzzy (mostly with AND), but the suggestions are not helpful to the user, because they have the first word followed by a bunch of possible matches for the second word. I can see how these terms are all found in one document, but the user is not helped by that suggestion. Even your example in the docs is confusing, where you call autosuggest for "zen ar" and get "zen art archery" as a suggestion. It makes sense once you know the parameters, but as a suggestion, it's not something you would click on. I think the user would be helped by showing "zen art" separate from "zen archery".

Do I need to make an elaborate filter to get suggestions that make sense?

lucaong commented 2 years ago

Hello @joyously , The auto suggestion engine in MiniSearch has pros and cons. The big pro is that it comes at no additional effort, and that the suggestions are computed on the actual data, so they are scored by relevance and guaranteed to yield results. The con is that the relevance score may sometimes produce queries that are relevant, but not something a user is likely to type, due to “over expanding” terms (like the “zen ar” example, where “ar” is expanded as both “art” and “archery”, since one of the titles is “zen and the art of archery”).

There is definitely room to improve it, but I need actionable ideas. What specific kind of improvements would you suggest?

Do I need to make an elaborate filter to get suggestions that make sense?

If you do make an elaborate filter that produces better results, please post it here because it could help me improve the standard algorithm. I am all for making it better.

joyously commented 2 years ago

What specific kind of improvements would you suggest?

Like I said,

I think the user would be helped by showing "zen art" separate from "zen archery".

I don't know how you calculate the score, but splitting the terms should be done internally. I would expect something like you get at Google: Google-45

It may be that this is only needed for AND suggestions.

joyously commented 2 years ago

For my use case of a static blog search, I chose to use AND and prefix: true with no fuzzy on the autosuggest. As I am using a <datalist> for the suggestions, I need the part already typed to exist in order for the suggestion to show, but this same concern would apply for the user choosing one of the suggestions. This is my event handler (commented the reduction of the list, for testing):

theform.q.addEventListener('input', function(event) {
  let keyword = theform.q.value.trim();
  if (keyword.length > 1) {
    let words = keyword.split(' '),
      last = words.pop(),
      results = idx.autoSuggest(keyword, {prefix: true,
      fuzzy: null, combineWith: 'AND'})
//      .filter(({ suggestion, score }, _, [first]) => score > first.score / 4)
;//     .slice(0, 5);
    suggester.innerHTML = results.map(function(item, i) {
      let suggest = words.join(' ') + ' ' + item.terms
      .filter( term => term.startsWith(last) && ! words.includes(term) )
      .join(' ');
      return '<option class="item" value="' + suggest.trim() + '">';
    }).join('');
  }
}, false);

suggest

Basically, I discard all the terms for the first words and leave the ones for the last word being typed.

lucaong commented 2 years ago

Thank you @joyously , this is useful!

I’ll try to take inspiration and come up with some improvements to the auto suggestions for the next release.

lucaong commented 2 years ago

@joyously I think that this configuration would achieve the same effect (minus the filtering for result having a score higher than 1/4 of the top score, which makes total sense but could be implementation dependent): combine with AND, no fuzzy match, prefix match only on the last term of the query.

In code:

miniSearch.autoSuggest(query,  {
  combineWith: 'AND',
  fuzzy: false,
  prefix: (term, i, terms) => i === terms.length - 1
})

If this really achieves the same effect, it could become the new default for auto suggestions. After all, combining with AND by default does make sense.

What do you think?

joyously commented 2 years ago

I think that is the prefix function used in your songs example. Because it turns off prefix for the already typed words, there are no suggestions after entering a partial word. (I didn't put the Dev Tools in the frame, but there are no suggestions after the second space.) suggest2

I changed it to use prefix:true and it does work better (like my original), however the results look the same because of my use of <datalist>, which won't show unless everything typed matches an entry in the list. (The suggestions are there, just not showing.) I was thinking that only <datalist> would need to match with what is typed, but other elements would have the same problem if the intent is to help the user type. Other code could execute the search once chosen, but I see suggestions as helping to narrow the search by adding to the input field. suggest3

I do think that AND is a better default than OR, because the more words you type the more you want to narrow it down (not broaden the search). But the autosuggest can have different prefix and fuzzy than the actual search.

lucaong commented 2 years ago

One problem with making AND the default for auto suggestions is that the behavior might be surprising when searching into multiple fields. Imagine searching for beatles lucy di in the demo app: with OR, one gets suggested beatles lucy diamonds, because of "Lucy in the Sky with Diamonds" by the Beatles. With AND, no suggestion is returned, because while the author matches beatles and the title matches lucy di, none of the fields match all terms.

In sum, I am still undecided whether to use AND or OR by default, but definitely this issue shows that more documentation is needed about how to configure autoSuggest to meet one's use case.

joyously commented 2 years ago

Ah, I was unaware of that. So I guess the documentation does need to mention which options affect the entire process and which are specific to the term, using examples for all the ways a user might configure (like one field or multiple). My own use case is a single field, so I will use AND. I suppose you could use OR for multiple fields and AND for single field, as defaults, as long as it's documented.

lucaong commented 2 years ago

Yep I agree, I think the best course of action here is to clearly document how to achieve different behaviors.

lucaong commented 2 years ago

Oh, forget what I said... it turns out that the beatles lucy di example is confusing, as the demo dataset only includes "Lucy in the Sky with Diamonds" from Elton John, not the original by The Beatles.

Using combineWith: 'AND' does include results where all terms are present, even in different fields. Therefore, it seems a good default for the auto suggestion.

The point about improving documentation is still valid though.

lucaong commented 2 years ago

Closing after changing the default to combineWith: 'AND' in #161 and adding some more documentation.

Ideally, in the future more examples should be provided in a "how to" section, which would be useful also for other features.

The new default is released in v5.0.0-beta3.