olivernn / lunr.js

A bit like Solr, but much smaller and not as bright
http://lunrjs.com
MIT License
8.87k stars 547 forks source link

Find similar documents #434

Open stupkad opened 4 years ago

stupkad commented 4 years ago

Hi there,

I am using lunr for a private wiki and love the library. Is there a way to find similar documents, like discussed in this article?

https://stackoverflow.com/questions/7657673/how-to-find-similar-documents

Regards, Dietmar

olivernn commented 4 years ago

Finding similar documents is not currently supported.

Almost everything that would be required for implementing this feature is currently supported though. At index time all documents are converted into term vectors, this are stored in the index. When querying the search query is also converted into a similar vector. The similarity between the query and a document is done by comparing similarity between these vectors.

So, since all the documents are already represented as vectors, being able to get a list of similar documents is just a matter of looking up the vector for the given document ID, then doing a similarity search with all the other documents.

There might be ways this can be optimised but that is the basics of how to implement it.

olivernn commented 4 years ago

The code that does the similarity on vectors for querying is here.

rjurney commented 2 years ago

Hmmmm we need to present an deep neural encoding of the search query and use it as a feature in the search - either the only feature or part of the query. This looks not too hard to implement @rflow?