modelscope / agentscope

Start building LLM-empowered multi-agent applications in an easier way.
https://doc.agentscope.io/
Apache License 2.0
4.7k stars 294 forks source link

[Bug]:[RAG]一直提示我找不到信息,请帮忙看下我错在哪里 #340

Closed NeilWang079 closed 1 day ago

NeilWang079 commented 1 month ago

AgentScope is an open-source project. To involve a broader community, we recommend asking your questions in English.

Describe the bug Please provide me with the text from which you'd like me to extract information about the three brothers in Juul County. I need the context to answer your question! 屏幕截图 2024-07-15 182516 屏幕截图 2024-07-15 182735

ZiTao-Li commented 1 month ago

It seems you replace data sources of the rag example and the program does not report any run time error. Can you provide the information 1) the text chunks retrieved for this question from your prepared data 2) which model you are using?

NeilWang079 commented 1 month ago

It seems you replace data sources of the rag example and the program does not report any run time error. Can you provide the information 1) the text chunks retrieved for this question from your prepared data 2) which model you are using?

embedding:gte-qwen2-7b-instruct-embed-f16 llm:gemma2:27b file type :pdf 屏幕截图 2024-07-16 015254

ZiTao-Li commented 1 month ago

It seems you replace data sources of the rag example and the program does not report any run time error. Can you provide the information 1) the text chunks retrieved for this question from your prepared data 2) which model you are using?

embedding:gte-qwen2-7b-instruct-embed-f16 llm:gemma2:27b file type :pdf 屏幕截图 2024-07-16 015254

My conjecture is that llamaindex's default SimpleDirectoryReader fails to parse this pdf file. You can turn "log_retrieval" to true in the agent config to print the retrieved chunks for of this question. If those are not human readable, then that's probably the issue.

NeilWang079 commented 1 month ago

It seems you replace data sources of the rag example and the program does not report any run time error. Can you provide the information 1) the text chunks retrieved for this question from your prepared data 2) which model you are using?

embedding:gte-qwen2-7b-instruct-embed-f16 llm:gemma2:27b file type :pdf 屏幕截图 2024-07-16 015254

My conjecture is that llamaindex's default SimpleDirectoryReader fails to parse this pdf file. You can turn "log_retrieval" to true in the agent config to print the retrieved chunks for of this question. If those are not human readable, then that's probably the issue.

content: score:0.18777823325891718 source:page_label: 329 file_name: 3.pdf file_path: O:\agentscope\docs\fiction\3.pdf file_type: application/pdf file_size: 18620341 creation_date: 2024-07-15 last_modified_date: 2024-07-15 content: score:0.18734724511061887 source:page_label: 321 file_name: 3.pdf file_path: O:\agentscope\docs\fiction\3.pdf file_type: application/pdf file_size: 18620341 creation_date: 2024-07-15 last_modified_date: 2024-07-15 content: score:0.18725543754810822 source:page_label: 328 file_name: 3.pdf file_path: O:\agentscope\docs\fiction\3.pdf file_type: application/pdf file_size: 18620341 creation_date: 2024-07-15 last_modified_date: 2024-07-15 content: score:0.18567446110272431 source:page_label: 326 file_name: 3.pdf file_path: O:\agentscope\docs\fiction\3.pdf file_type: application/pdf file_size: 18620341 creation_date: 2024-07-15 last_modified_date: 2024-07-15 content: 2024-07-16 10:53:00.559 | INFO | agentscope.agents.rag_agent:reply:161 - { "text": "NO \n\nThe provided content appears to be metadata and excerpts from a PDF file, but doesn't contain information about the names of the three brothers from Julu County. \n", "embedding": null, "image_urls": null, "parsed": null, "raw": { "model": "gemma2:27b", "created_at": "2024-07-16T02:53:00.545041Z", "message": { "role": "assistant", "content": "NO \n\nThe provided content appears to be metadata and excerpts from a PDF file, but doesn't contain information about the names of the three brothers from Julu County. \n" }, "done_reason": "stop", "done": true, "total_duration": 12650800300, "load_duration": 10303735000, "prompt_eval_count": 662, "prompt_eval_duration": 912066000, "eval_count": 38, "eval_duration": 1433258000 } } 2024-07-16 10:53:02.303 | WARNING | agentscope.message:init:150 - A new field role is newly added to the message. Please specify the role of the message. Currently we use a default "assistant" value. Book-Assistant: The names of the three brothers from Julu County are not mentioned in the provided context. Please provide me with the text where you encountered these characters so I can assist you further.
The contents of the logging.log file is this

ZiTao-Li commented 1 day ago

It seems you replace data sources of the rag example and the program does not report any run time error. Can you provide the information 1) the text chunks retrieved for this question from your prepared data 2) which model you are using?

embedding:gte-qwen2-7b-instruct-embed-f16 llm:gemma2:27b file type :pdf 屏幕截图 2024-07-16 015254

My conjecture is that llamaindex's default SimpleDirectoryReader fails to parse this pdf file. You can turn "log_retrieval" to true in the agent config to print the retrieved chunks for of this question. If those are not human readable, then that's probably the issue.

content: score:0.18777823325891718 source:page_label: 329 file_name: 3.pdf file_path: O:\agentscope\docs\fiction\3.pdf file_type: application/pdf file_size: 18620341 creation_date: 2024-07-15 last_modified_date: 2024-07-15 content: score:0.18734724511061887 source:page_label: 321 file_name: 3.pdf file_path: O:\agentscope\docs\fiction\3.pdf file_type: application/pdf file_size: 18620341 creation_date: 2024-07-15 last_modified_date: 2024-07-15 content: score:0.18725543754810822 source:page_label: 328 file_name: 3.pdf file_path: O:\agentscope\docs\fiction\3.pdf file_type: application/pdf file_size: 18620341 creation_date: 2024-07-15 last_modified_date: 2024-07-15 content: score:0.18567446110272431 source:page_label: 326 file_name: 3.pdf file_path: O:\agentscope\docs\fiction\3.pdf file_type: application/pdf file_size: 18620341 creation_date: 2024-07-15 last_modified_date: 2024-07-15 content: 2024-07-16 10:53:00.559 | INFO | agentscope.agents.rag_agent:reply:161 - { "text": "NO \n\nThe provided content appears to be metadata and excerpts from a PDF file, but doesn't contain information about the names of the three brothers from Julu County. \n", "embedding": null, "image_urls": null, "parsed": null, "raw": { "model": "gemma2:27b", "created_at": "2024-07-16T02:53:00.545041Z", "message": { "role": "assistant", "content": "NO \n\nThe provided content appears to be metadata and excerpts from a PDF file, but doesn't contain information about the names of the three brothers from Julu County. \n" }, "done_reason": "stop", "done": true, "total_duration": 12650800300, "load_duration": 10303735000, "prompt_eval_count": 662, "prompt_eval_duration": 912066000, "eval_count": 38, "eval_duration": 1433258000 } } 2024-07-16 10:53:02.303 | WARNING | agentscope.message:init:150 - A new field role is newly added to the message. Please specify the role of the message. Currently we use a default "assistant" value. Book-Assistant: The names of the three brothers from Julu County are not mentioned in the provided context. Please provide me with the text where you encountered these characters so I can assist you further. The contents of the logging.log file is this

It seems this is because your PDF file can't be parsed by the default Llamaindex pdf parser, so there is no useful information. You can either first convert the PDF into plain text format, or try with some other files.

I will close this issue for now as it is not a issue with AgentScope. But please let us know if you have any comments.