[Bug]: <title> system prompt在openai中好像无效

microsoft / graphrag

A modular graph-based Retrieval-Augmented Generation (RAG) system

https://microsoft.github.io/graphrag/

MIT License

16.86k stars 1.58k forks source link

[Bug]: <title> system prompt在openai中好像无效 #786

Open TTTnlp opened 1 month ago

TTTnlp commented 1 month ago

Do you need to file an issue?

[ ] I have searched the existing issues and this bug is not already filed.
[ ] My model is hosted on OpenAI or Azure. If not, please look at the "model providers" issue and don't file a new one here.
[ ] I believe this is a legitimate bug, not just a question. If this is a question, please use the Discussions area.

Describe the bug

当我使用默认的system prompt时，模型总是回答I apologize, but I don't have any information about...，然后当prompt变短时，模型可以正常回答，这是openai的问题吗，还是说在后续的处理中将system prompt截断了？

Steps to reproduce

No response

Expected Behavior

No response

GraphRAG Config Used

# Paste your config here

Logs and screenshots

No response

Additional Information

GraphRAG Version:
Operating System:
Python Version:
Related Issues:

ps-rock commented 1 month ago

我都遇到這個問題，我是用ollama 行llama3.1的

我試過定義 role 是 "You are a helpful assistant of company ABC"

接問用local search 問"你是誰" 時，會直接回答 I'm an artificial intelligence model known as Llama

但縮短了LOCAL_SEARCH_SYSTEM_PROMPT後(例如直接刪走Data tables) ，他就懂得回答 "I am a helpful assistant of company ABC"

我都嚐試過直接把graphrag的 local search system prompts 抽出來直接問 llama3.1或gemma2 同樣都會無視了我定義的role 及不會在DataTables找答案

所以看來是system prompt 太長會令LLM 無法正常運作

ps-rock commented 1 month ago

如果你是用Ollama的話可以參考這個方法

我找到原來Ollama限制了tokens長度為2000，可以用Modelfile或經API 設定num_ctx的上限(e.g: 12800)，這樣LLM就接收到完整的system prompt了。

natoverse commented 1 month ago

Is this happening when running global search? We have checks in place to filter out any community responses that the LLM deems low-relevance, which can sometimes mean no relevant summaries are collected together and the end result is that it can't answer the question. This is a somewhat cautious approach to avoid hallucination. You may be able to get better results if you tune the prompt to your domain for how the LLM assesses relevance and assigns the "Importance Score" here.

github-actions[bot] commented 3 weeks ago

This issue has been marked stale due to inactivity after repo maintainer or community member responses that request more information or suggest a solution. It will be closed after five additional days.