labring / FastGPT

FastGPT is a knowledge-based platform built on the LLMs, offers a comprehensive suite of out-of-the-box capabilities such as data processing, RAG retrieval, and visual AI workflow orchestration, letting you easily develop and deploy complex question-answering systems without the need for extensive setup or configuration.
https://tryfastgpt.ai
Other
17.2k stars 4.61k forks source link

Knowledge base search cannot accurately identify the number #733

Open lijiajun1997 opened 8 months ago

lijiajun1997 commented 8 months ago

例行检查

你的版本

问题描述 根据编号在数据库检索,不管是语义检索还是全文检索都无法搜索到相关的编号,如果未来需要商业化部署常见场景中,这个问题都会导致机器人答非所问。 复现步骤 1.导入带有编号/型号数据的知识库 2.已经尝试通过Q&A训练和文档上传知识库,保证编号的关键词已经被知识库多次覆盖。 3.通过语义检索和全文检索均无法准确指向对于的知识。

预期结果 相关编号/型号的段落被检索并输出至AI进行问题回答。 建议专门为编号类型加一个判断功能和检索模式。

相关截图 image image image

c121914yu commented 8 months ago

Bot detected the issue body's language is not English, translate it automatically. 👯👭🏻🧑‍🤝‍🧑👫🧑🏿‍🤝‍🧑🏻👩🏾‍🤝‍👨🏿👬🏿


Routine inspection

your version

Problem Description When searching the database based on the number, neither semantic retrieval nor full-text retrieval can search for the relevant number. If commercial deployment is required in common scenarios in the future, this problem will cause the robot to answer questions incorrectly. Steps to reproduce

  1. Import the knowledge base with serial number/model data
  2. We have tried Q&A training and document uploading to the knowledge base to ensure that the numbered keywords have been covered by the knowledge base multiple times.
  3. Neither semantic retrieval nor full-text retrieval can accurately point to the relevant knowledge.

expected outcome The relevant number/model paragraphs are retrieved and output to AI for question answering. It is recommended to add a judgment function and search mode specifically for the number type.

Related screenshots image image image

c121914yu commented 8 months ago

混合检索+重排就行

c121914yu commented 7 months ago

mark下,BM25分词不是很好,后续看看有没有好的分词方法。

lijiajun1997 commented 5 months ago

mark下,BM25分词不是很好,后续看看有没有好的分词方法。

建议在知识库匹配的选项里增加传统的精准匹配,用于型号、编号问答场景,让用户设置编号的正则表达或者用提取功能。