构建本地知识库问答机器人，超过max context length

liaokongVFX / LangChain-Chinese-Getting-Started-Guide

LangChain 的中文入门教程

7.44k stars 595 forks source link

构建本地知识库问答机器人，超过max context length #19

Closed LeiShenVictoria closed 1 year ago

LeiShenVictoria commented 1 year ago

您好，通过您的“构建本地知识库问答机器人”代码，上传了一个pdf文件，然后进行问答的时候，出现如下错误： This model's maximum context length is 4097 tokens, however you requested 10844 tokens (10588 in your prompt; 256 for the completion). Please reduce your prompt; or completion length.

liaokongVFX commented 1 year ago

每个document太长了，可以参考这个回答 https://github.com/liaokongVFX/LangChain-Chinese-Getting-Started-Guide/issues/7

LeiShenVictoria commented 1 year ago

是说减小chunk_size值吗，目前documents = loader.load() text_splitter = CharacterTextSplitter(chunk_size=100, chunk_overlap=0) split_docs = text_splitter.split_documents(documents)，chunk_size=100

liaokongVFX commented 1 year ago

对的，要不就减小搜索时k的值。他总内容长度=内置prompt模板长度 + 问题长度 +（搜索出来document内容长度 * 搜索出来的条数K），可以逐一排查一下。（另外，分割器推荐使用：RecursiveCharacterTextSplitter，拆分出来的doc语义相关性更强）