Hi! Thanks for this awesome code. I have a question about using zope.textindex to find similar documents to a target document. I want to find the indexed document that is most similar to a query document.
I can do this by turning the document into a big OR string (see below). Is there a more efficient way to find the most similar documents when the tokens present in the query document are not a strict subset of the tokens in the target document?
>>> from zope.index.text.textindex import TextIndex
>>> index = TextIndex()
>>> index.index_doc(1, "silver pearl splitter")
>>> index.apply("silver pearl splayer") # this doesn't work
BTrees.IFBTree.IFBucket([])
>>> index.apply(" OR ".join("silver pearl splayer").split()) # This does
Hi! Thanks for this awesome code. I have a question about using zope.textindex to find similar documents to a target document. I want to find the indexed document that is most similar to a query document.
I can do this by turning the document into a big OR string (see below). Is there a more efficient way to find the most similar documents when the tokens present in the query document are not a strict subset of the tokens in the target document?