Overall System Prompt - Githubissues

John-Nuos commented 3 months ago

Is there an existing issue for this?

[X] I have searched the existing issues
[X] I have checked #657 to validate if my issue is covered by community support

Describe the issue

Where can I edit the overall system prompt. I want to customize GraphRAG's final output. Example I want the final output to be only in Burmese language.

Steps to reproduce

No response

GraphRAG Config Used

# Paste your config here

Logs and screenshots

No response

Additional Information

GraphRAG Version:
Operating System:
Python Version:
Related Issues:

John-Nuos commented 3 months ago

Please Note I have tried the auto tuning using two followings but it didn't work.

python -m graphrag.prompt_tune --root /path/to/project --domain "Microbiology" --method random --limit 10 --language Myanmar --max-tokens 2048 --chunk-size 256 --no-entity-types --output /path/to/output

python -m graphrag.prompt_tune --root /path/to/project --domain "Microbiology" --method random --limit 10 --language Burmese --max-tokens 2048 --chunk-size 256 --no-entity-types --output /path/to/output

shipaleks commented 3 months ago

It looks like --language field doesn't work, try without it instead.

AlonsoGuevara commented 3 months ago

Hi @John-Nuos What version were you using? We just released 0.2.0, which includes the --language parameter. If it was on 0.1.1 that would've failed

John-Nuos commented 3 months ago

thanks for your reply. I will check tomorrow if I am using latest version. I just pip install to my virtual environment.

On Thu, Jul 25, 2024 at 11:56 PM Alonso Guevara @.***> wrote:

Hi @John-Nuos https://github.com/John-Nuos What version were you using? We just released 0.2.0, which includes the --language parameter. If it was on 0.1.1 that would've failed

— Reply to this email directly, view it on GitHub https://github.com/microsoft/graphrag/issues/706#issuecomment-2250980257, or unsubscribe https://github.com/notifications/unsubscribe-auth/BIC5KKZTHUGC2K472MD3WMLZOEU5BAVCNFSM6AAAAABLN2RG3WVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENJQHE4DAMRVG4 . You are receiving this because you were mentioned.Message ID: @.***>

JordyBrakie commented 3 months ago

Hey, you can add the system prompt to your Local Search function, e.g:

search_engine = LocalSearch( llm=llm, context_builder=context_builder, system_prompt = LOCAL_SEARCH_SYSTEM_PROMPT, token_encoder=token_encoder, llm_params=llm_params, context_builder_params=local_context_params, response_type="multiple paragraphs", )

The default prompt looks like this, but you can simply alter it to produce answers in your preferred language:

`"""Local search system prompts."""

LOCAL_SEARCH_SYSTEM_PROMPT = """ ---Role---

You are a helpful assistant responding to questions about data in the tables provided.

---Goal---

Generate a response of the target length and format that responds to the user's question, summarizing all information in the input data tables appropriate for the response length and format, and incorporating any relevant general knowledge.

If you don't know the answer, just say so. Do not make anything up.

Points supported by data should list their data references as follows:

"This is an example sentence supported by multiple data references [Data: (record ids); (record ids)]."

Do not list more than 5 record ids in a single reference. Instead, list the top 5 most relevant record ids and add "+more" to indicate that there are more.

For example:

"Person X is the owner of Company Y and subject to many allegations of wrongdoing [Data: Sources (15, 16), Reports (1), Entities (5, 7); Relationships (23); Claims (2, 7, 34, 46, 64, +more)]."

where 15, 16, 1, 5, 7, 23, 2, 7, 34, 46, and 64 represent the id (not the index) of the relevant data record.

Do not include information where the supporting evidence for it is not provided.

---Target response length and format---

{response_type}

---Data tables---

{context_data}

---Goal---

Generate a response of the target length and format that responds to the user's question, summarizing all information in the input data tables appropriate for the response length and format, and incorporating any relevant general knowledge.

If you don't know the answer, just say so. Do not make anything up.

Points supported by data should list their data references as follows:

"This is an example sentence supported by multiple data references [Data: (record ids); (record ids)]."

Do not list more than 5 record ids in a single reference. Instead, list the top 5 most relevant record ids and add "+more" to indicate that there are more.

For example:

"Person X is the owner of Company Y and subject to many allegations of wrongdoing [Data: Sources (15, 16), Reports (1), Entities (5, 7); Relationships (23); Claims (2, 7, 34, 46, 64, +more)]."

where 15, 16, 1, 5, 7, 23, 2, 7, 34, 46, and 64 represent the id (not the index) of the relevant data record.

Do not include information where the supporting evidence for it is not provided.

---Target response length and format---

{response_type}

Add sections and commentary to the response as appropriate for the length and format. Style the response in markdown. """`

https://github.com/microsoft/graphrag/blob/main//graphrag/query/structured_search/local_search/system_prompt.py

jarrett-au commented 3 months ago

Hey, you can add the system prompt to your Local Search function, e.g:

search_engine = LocalSearch( llm=llm, context_builder=context_builder, system_prompt = LOCAL_SEARCH_SYSTEM_PROMPT, token_encoder=token_encoder, llm_params=llm_params, context_builder_params=local_context_params, response_type="multiple paragraphs", )

The default prompt looks like this, but you can simply alter it to produce answers in your preferred language:

`"""Local search system prompts."""

LOCAL_SEARCH_SYSTEM_PROMPT = """ ---Role---

You are a helpful assistant responding to questions about data in the tables provided.

---Goal---

Generate a response of the target length and format that responds to the user's question, summarizing all information in the input data tables appropriate for the response length and format, and incorporating any relevant general knowledge.

If you don't know the answer, just say so. Do not make anything up.

Points supported by data should list their data references as follows:

"This is an example sentence supported by multiple data references [Data: (record ids); (record ids)]."

Do not list more than 5 record ids in a single reference. Instead, list the top 5 most relevant record ids and add "+more" to indicate that there are more.

For example:

"Person X is the owner of Company Y and subject to many allegations of wrongdoing [Data: Sources (15, 16), Reports (1), Entities (5, 7); Relationships (23); Claims (2, 7, 34, 46, 64, +more)]."

where 15, 16, 1, 5, 7, 23, 2, 7, 34, 46, and 64 represent the id (not the index) of the relevant data record.

Do not include information where the supporting evidence for it is not provided.

---Target response length and format---

{response_type}

---Data tables---

{context_data}

---Goal---

Generate a response of the target length and format that responds to the user's question, summarizing all information in the input data tables appropriate for the response length and format, and incorporating any relevant general knowledge.

If you don't know the answer, just say so. Do not make anything up.

Points supported by data should list their data references as follows:

"This is an example sentence supported by multiple data references [Data: (record ids); (record ids)]."

Do not list more than 5 record ids in a single reference. Instead, list the top 5 most relevant record ids and add "+more" to indicate that there are more.

For example:

"Person X is the owner of Company Y and subject to many allegations of wrongdoing [Data: Sources (15, 16), Reports (1), Entities (5, 7); Relationships (23); Claims (2, 7, 34, 46, 64, +more)]."

where 15, 16, 1, 5, 7, 23, 2, 7, 34, 46, and 64 represent the id (not the index) of the relevant data record.

Do not include information where the supporting evidence for it is not provided.

---Target response length and format---

{response_type}

Add sections and commentary to the response as appropriate for the length and format. Style the response in markdown. """`

https://github.com/microsoft/graphrag/blob/main//graphrag/query/structured_search/local_search/system_prompt.py

Does Graphrag support adding a system prompt during the graphrag.index process? When I import documents from different companies, their information gets mixed up. For example, when querying employees or related information of Company A, it returns information about Company B. So I am thinking if it's possible to add some metadata through a system prompt.

natoverse commented 3 months ago

Does Graphrag support adding a system prompt during the graphrag.index process? When I import documents from different companies, their information gets mixed up. For example, when querying employees or related information of Company A, it returns information about Company B. So I am thinking if it's possible to add some metadata through a system prompt.

There are many different LLM calls in the indexing process so it would take quite a bit of work to make an injectable system prompt for each.

MisterAndry commented 2 months ago

I have the same request. It would be nice to be able to modify LOCAL_SEARCH_SYSTEM_PROMPT, MAP_SYSTEM_PROMPT and REDUCE_SYSTEM_PROMPT the same way I can modify the prompts in the prompts folder that appears after the --init command. This is necessary because my main prompts are not in English, and since the system prompts I described above are in English, LLM sometimes responds in English, which I don't need.

github-actions[bot] commented 2 months ago

This issue has been marked stale due to inactivity after repo maintainer or community member responses that request more information or suggest a solution. It will be closed after five additional days.

github-actions[bot] commented 2 months ago

This issue has been closed after being marked as stale for five days. Please reopen if needed.

microsoft / graphrag

Overall System Prompt #706

Is there an existing issue for this?

Describe the issue

Steps to reproduce

GraphRAG Config Used

Logs and screenshots

Additional Information