I am trying to implement behaviour similar to sphinx search engine handling phrases with wildcards. For this i use whoosh library. But when i use sequence queries with short words (2 chars length) and wildcards i get an error:
289 return [Span(pos) for pos in self.value_as("positions")]
290 else:
--> 291 raise Exception("Field does not support spans")
Exception: Field does not support spans
I noticied this happens when i add a lot of documents to the index, it doesn't happn with small number of documents though.
I want be able to search with queries like:
"найденный" AND ("по мест проживания" OR "рядом с домом")
Here "по мест проживания" causes the error. When i reduce it to "по" it runs well, if i change it a bit to "по дороге" i am still getting the same error.
Code and expected results
from whoosh.fields import Schema, TEXT, NUMERIC
from whoosh.qparser import QueryParser, PhrasePlugin, SequencePlugin, OperatorsPlugin
from whoosh import analysis
from whoosh.filedb.filestore import RamStorage
analyzer = analysis.StandardAnalyzer(minsize=None, stoplist=None)
schema = Schema(item_id=NUMERIC(stored=True, bits=64), type=NUMERIC(stored=True), content=TEXT(analyzer=analyzer, stored=True, phrase=True))
storage = RamStorage()
ix = storage.create_index(schema)
writer = ix.writer()
with get_db() as db:
for item in db["items"][0:500]:
writer.add_document(
item_id=item["id"], type=item["type"], content=item["content"]
)
writer.commit(optimize=True)
parser = QueryParser("content", schema=schema)
op = OperatorsPlugin(
And="AND", Or="OR", AndNot="ANT", Not=None, AndMaybe=None, Require=None
)
parser.remove_plugin_class(PhrasePlugin)
parser.add_plugin(SequencePlugin())
parser.replace_plugin(op)
with ix.searcher() as searcher:
query = '"найденный" AND ("по* мест* проживания" OR "рядом с домом")'
query = parser.parse(query, debug=True)
hits = searcher.search(query, terms=True, limit=None)
pprint(list(hits))
Expecting to get a list of hits but I am getting the Exception: Field does not support spans instead.
My content is text of variable length in different languages. Queries are also might be in different languages.
Problem
I am trying to implement behaviour similar to sphinx search engine handling phrases with wildcards. For this i use whoosh library. But when i use sequence queries with short words (2 chars length) and wildcards i get an error:
I noticied this happens when i add a lot of documents to the index, it doesn't happn with small number of documents though.
I want be able to search with queries like:
Here "по мест проживания" causes the error. When i reduce it to "по" it runs well, if i change it a bit to "по дороге" i am still getting the same error.
Code and expected results
Expecting to get a list of hits but I am getting the Exception: Field does not support spans instead.
My content is text of variable length in different languages. Queries are also might be in different languages.