zopefoundation / zope.index

Indices for using with catalog like text, field, etc.
Other
10 stars 12 forks source link

Using zope.index to find similar documents efficiently #2

Closed fgregg closed 8 years ago

fgregg commented 10 years ago

Hi! Thanks for this awesome code. I have a question about using zope.textindex to find similar documents to a target document. I want to find the indexed document that is most similar to a query document.

I can do this by turning the document into a big OR string (see below). Is there a more efficient way to find the most similar documents when the tokens present in the query document are not a strict subset of the tokens in the target document?

>>> from zope.index.text.textindex import TextIndex
>>> index = TextIndex()
>>> index.index_doc(1, "silver pearl splitter")
>>> index.apply("silver pearl splayer") # this doesn't work
BTrees.IFBTree.IFBucket([])
>>> index.apply(" OR ".join("silver pearl splayer").split()) # This does