lucaong / minisearch

Tiny and powerful JavaScript full-text search engine for browser and Node
https://lucaong.github.io/minisearch/
MIT License
4.68k stars 135 forks source link

Feature request: Result highlighting #23

Closed flunderpero closed 4 years ago

flunderpero commented 4 years ago

Are there plans on supporting highlighting of matched characters in the result? Other search engines do this by return match-indexes in the result.

lucaong commented 4 years ago

Hi @flunderpero, thanks for your question. Highlighting of matched characters is definitely a useful feature.

MiniSearch does not currently return the offsets of matched terms, and at the moment there are no plans of doing that. The reason is that, in order to efficiently return these offsets, they would have to be saved as part of the index, making it substantially bigger. As MiniSearch focuses on memory constrained use-cases, that would be against the main project goal.

That said, it is possible to obtain the index of the matched terms by leveraging another feature: each search result contains a match field, that indicates which terms were matched and in which fields.

const ms = new MiniSearch({ fields: ['title', 'text'] })
ms.addAll([
  { id: 1, title: 'Divina Commedia', text: 'Nel mezzo del cammin di nostra vita' },
  { id: 2, title: 'I Promessi Sposi', text: 'Quel ramo del lago di Como' },
  { id: 3, title: 'Vita Nova', text: 'In quella parte del libro della mia memoria ... vita' }
])

// Search results contain a match field:
const results = ms.search('vita nova')
// => [
//   { id: 3, score: ..., match: { vita: ['title',  'text'], nova: ['title'] } },
//   { id: 1, score: ..., match: { vita: ['text'] } }
// ]

// In this case, the first result contains the term `vita` in the
// `title` and `text` fields, and the term `nova` in the `title`,
// The second result only contains `vita` in the `text` field.

By using this information, it is possible to obtain the index of each matched term in each matched field using String.prototype.indexOf. This is a perfectly valid strategy when the document size is not too large.

I personally think that the trade off between having a smaller inverted index or having pre-computed match offsets is in favor of the former on typical MiniSearch use cases. That said, if a strong case is made for it, this is something that could be implemented.

lucaong commented 4 years ago

I will close this issue for now, but willing to reopen it if there is need for a discussion around it

flunderpero commented 4 years ago

Ok, I got your point(s). Keep up the good work!

lucaong commented 4 years ago

Thanks 🙂