快速补全「下」一个句子

Leizhenpeng commented 1 year ago

https://whoosh.readthedocs.io/en/latest/searching.html#convenience-methods

当使用Whoosh进行文档搜索时，你可以考虑对文档进行结构化编号，并将这些编号存储在文档的某个字段（比如id字段）中，以便轻松地实现全局顺序。这样，你可以根据上一个文档的编号来获取下一个文档，使处理文本数据更加方便。下面是一些示例和说明，以帮助更清晰地表达这个概念：

编号文档：首先，将文档进行编号，可以使用递增的整数作为唯一标识。将这个编号存储在文档的id字段中，或者你可以创建一个新的字段来存储这些编号。
```
# 示例文档
doc1 = {"id": 1, "content": "这是第一句话。"}
doc2 = {"id": 2, "content": "这是第二句话。"}
doc3 = {"id": 3, "content": "这是第三句话。"}
```

查询文档：使用Whoosh进行查询操作，获取到满足条件的文档。

from whoosh.index import open_dir
from whoosh.qparser import QueryParser

# 打开索引
index = open_dir("your_index_directory")

# 创建查询解析器
query_parser = QueryParser("content", schema=index.schema)

# 创建查询
query = query_parser.parse("关键词查询")

# 执行查询
with index.searcher() as searcher:
   results = searcher.search(query)

获取下一句：一旦你有了满足条件的文档，你可以通过查找当前文档的编号并加一来获得下一个文档。

current_doc_id = 2  # 假设你已经有了当前文档的id

# 查找下一个文档
next_doc_id = current_doc_id + 1

# 从索引中获取下一个文档
with index.searcher() as searcher:
   next_doc = searcher.document(id=next_doc_id)

# 打印下一句话内容
if next_doc:
   print(next_doc["content"])
else:
   print("已经没有下一句了。")

通过这种方式，你可以方便地按顺序处理文本数据，并且在需要时获取下一句。这对于处理具有结构的文本数据非常有用，比如处理小说、文章或对话文本。

Leizhenpeng commented 1 year ago

还有个方法

 w.add_document(type="p", text="This is the first paragraph of chapter 1.", paragraph="chapter 1.",next="This is the second paragraph of chapter 1.")
 w.add_document(type="p", text="This is the second paragraph of chapter 1.", paragraph="chapter 1.",next="This is the third paragraph of chapter 1")
 w.add_document(type="p", text="This is the third paragraph of chapter 1.", paragraph="chapter 1.",next="")

jingfelix commented 1 year ago

在新的提交中实现了按照指定书籍指定 line_id 获取的功能。

在原有 API 返回的 json 中，还需要添加对应 line 的 id。

jingfelix / EasySearch

快速补全「下」一个句子 #4