vuejs / vitepress

Vite & Vue powered static site generator.
https://vitepress.dev
MIT License
12.79k stars 2.07k forks source link

Exact match local search #2731

Open MetRonnie opened 1 year ago

MetRonnie commented 1 year ago

Is your feature request related to a problem? Please describe.

You currently cannot perform an exact match search.

Describe the solution you'd like

The ability to do an exact match search when surrounding the string with quotes for example, or having a checkbox.

Describe alternatives you've considered

I've had a look at the minisearch options but it doesn't appear to be possible to achieve this there. See https://github.com/lucaong/minisearch/issues/216

Additional context

No response

Validations

brc-dd commented 1 year ago

The workaround mentioned there could work for you. Use .includes() inside the filter function? We aren't planning to move away from minisearch though. So, there won't be direct way probably.

MetRonnie commented 1 year ago

I'm not sure what you mean by the filter function. I tried entering it in themeConfig.search.options.miniSearch.searchOptions.filter as suggested by intellisense but it seems the function only takes a single argument called result with no way to compare to the search query.

brc-dd commented 1 year ago

ah sorry, you need boostDocument. you'll get the search term as second param. returning any falsy value skips the search result.

MetRonnie commented 1 year ago

Unfortunately not, the "term" seems to be the search result not the query.

I also tried fuzzy but that doesn't preserve any quotes around the query.

I think any filtering would have to be done when vitepress calls miniSearch.search() in order to access both the query and result

brc-dd commented 1 year ago

I think this can be implemented in the core. I have hit an issue though. If detailed view is not enabled, there will be no text in the search results and one can only check the exact match against title, which will limit the results 👀

kwesterfeld2 commented 1 year ago

I ran across this issue while trolling minisearch github issues to see if I couldn't figure out how to configure it to handle quoted searching. Our docs are huge (@1000 pages) and doing a simple search for "create table" in our database reference turns up many things NOT about tables....but they are documents about creating things. This is confusing and a bad ux.

I agree vitepress should support quoted terms as Algolia does with the local search plugin. I also agree that minisearch doesn't seem to be able to support it. However, the local search plugin could be taught to handle this case pretty easily I think.

Oh and thanks for vitepress! It is amazing/awesome.

kwesterfeld2 commented 1 year ago

Also, I found that adding this approximates what you'd really want, although I'm not sold on the prefix: true...but worth playing with:

  themeConfig: {
    search: {
        miniSearch: {
          searchOptions: {
            fuzzy: 0.1,
            prefix: true,
            boost: {
              title: 4, 
              text: 2, 
              titles: 1 
            },
            combineWith: 'AND'
          }
        }
      }
    }
lucaong commented 10 months ago

Hello, author of MiniSearch here, pitching in just to say that exact match is the default. Basically, in MiniSearch:

That said, I understand that in this thread by "exact match" is meant one of these two things:

  1. to require that all words in the search query appear in each result. For example, a query for "create table" should return only results that contain both "create" and "table"
  2. to require that the results contain the exact phrase in the query. For example, a query for "create table" should return only results that contain the phrase "create table" (as in, the words "create" and "table" next to each other in this order)

Point 1 is easily achievable in MiniSearch by specifying the search option combineWith: "AND", which combines each query term with an AND operation (as opposed to the default OR). This makes sure that a search for "create table" returns only results containing both the term "create" and the term "table".

Point 2 is generally not achievable in MiniSearch, and for a good reason: local search has to work with the constraint that the index must fit comfortably in the process memory. In order to perform phrase match, the index would have to contain also positional information of each term in each document. This would make the index vastly bigger, and consume a lot more memory. Server-side search does not usually have such limitations, as it can rely on large memory and disk space.

There is a workaround for point 2, in case it's ok to store the original document fields, which is to apply the same strategy as point 1, and on top of it filter the results to only those containing the phrase. Note that this might not be reasonable for large documents, where keeping in memory the whole text of each document is not possible. Here's a quick example:

const documents = [
  { id: 1, text: "the SQL create table command" },
  { id: 2, text: "how to create a website" },
  { id: 3, text: "how to create a wooden table" }
]

const miniSearch = new MiniSearch({
  fields: ['text'],
  storeFields: ['text']
})

miniSearch.addAll(documents)

// Search for documents containing the phrase "create table":
miniSearch.search("create table", {
  combineWith: 'AND',
  filter: (result) => result.text && result.text.includes("create table")
})
/* =>
[
  {
    id: 1,
    score: 1.840114344910601,
    terms: [ 'create', 'table' ],
    queryTerms: [ 'create', 'table' ],
    match: { create: [Array], table: [Array] },
    text: 'the SQL create table command'
  }
]
*/

My opinion is that the best option would be to show a checkbox in the UI to optionally require all terms in the query. When checked, this would pass combineWith: 'AND" in the search options. You can see an example of that in the MiniSearch demo, among the advanced options. The advanced search options in the UI could also enable toggling prefix and fuzzy match, again like in the MiniSearch demo.

Introducing a query syntax, like forcing to match all terms by surrounding them with quotes, would also be possible, but more complex. One would have to write a query parser, and specify the resulting query to MiniSearch as a query combination instead of a standard string query. The query combination API gives complete freedom, but requires a parser to translate the query syntax to a query combination, and might be more complex for users who don't know the exact syntax.

I hope this helps!