linjungz / chat-with-your-doc

Chat with your docs in PDF/PPTX/DOCX format, using LangChain and GPT4/ChatGPT from both Azure OpenAI Service and OpenAI
140 stars 48 forks source link

page info is not found from the referenced document #15

Closed linjungz closed 1 year ago

linjungz commented 1 year ago

Finished chain.

Finished chain. Traceback (most recent call last): File "/data/.venv/lib/python3.10/site-packages/gradio/routes.py", line 414, in run_predict output = await app.get_blocks().process_api( File "/data/.venv/lib/python3.10/site-packages/gradio/blocks.py", line 1320, in process_api result = await self.call_function( File "/data/.venv/lib/python3.10/site-packages/gradio/blocks.py", line 1048, in call_function prediction = await anyio.to_thread.run_sync( File "/data/.venv/lib/python3.10/site-packages/anyio/to_thread.py", line 31, in run_sync return await get_asynclib().run_sync_in_worker_thread( File "/data/.venv/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 937, in run_sync_in_worker_thread return await future File "/data/.venv/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 867, in run result = context.run(func, *args) File "/data/chat-with-your-doc/chat_web.py", line 89, in get_answer reference_html = f"""

Reference [{i+1}] {os.path.basename(doc.metadata["source"])} P{doc.metadata['page']+1} \n""" KeyError: 'page' image

Originally posted by @duronxx in https://github.com/linjungz/chat-with-your-doc/issues/14#issuecomment-1614545522

duronxx commented 1 year ago

尝试用邮件给你发源pdf,失败了,超过限制 ,给你连接 ,下载 吧http://static.cninfo.com.cn/finalpage/2023-03-15/1216118477.PDF

linjungz commented 1 year ago

尝试用邮件给你发源pdf,失败了,超过限制 ,给你连接 ,下载 吧http://static.cninfo.com.cn/finalpage/2023-03-15/1216118477.PDF

好的,我尝试了一下确实是不会显示 page , 怀疑是 PyPDFLoader 在处理PDF的时候没有把page信息提取到metadata.