schema = fields.Schema(type=fields.ID, text=fields.TEXT(stored=True), paragraph=fields.TEXT(stored=True))
ix = index.create_in("indexdir", schema)
with ix.writer() as w:
# 我们将每个章节和段落作为一个文档存储
with w.group():
w.add_document(type="chap", text="Chapter 1")
w.add_document(type="p", text="This is the first paragraph of chapter 1.", paragraph="chapter 1.")
w.add_document(type="p", text="This is the second paragraph of chapter 1.", paragraph="chapter 1.")
with w.group():
w.add_document(type="chap", text="Chapter 2")
w.add_document(type="p", text"This is the first paragraph of chapter 2.",paragraph = "chapter 2.")
whoosh 自带支持 nest structure content https://whoosh.readthedocs.io/en/latest/api/query.html#special-queries https://whoosh.readthedocs.io/en/latest/nested.html#
在Whoosh中,可以使用
store_positions=True
和store_termvector=True
将词语的位置存储在文档中。不过,你可以在添加文档时添加一个字段来存储该段落的全部内容。然后你可以根据需要检索这个字段。
然后,在获取最佳匹配句子时也检索整个段落: