Closed Amitabh-Priyadarshi-Bayer closed 4 months ago
I also encountered this error, I solved this bug by change the default summary report prompt, In which I change the "{{" and "}}" with "{" and "}"
@WeiminLee I did that. but the graph rag still fetching the old summary_report inside the code instead of the local changed summary_report.txt file that is mentioned in my settings.yaml.
@WeiminLee I did that. but the graph rag still fetching the old summary_report inside the code instead of the local changed summary_report.txt file that is mentioned in my settings.yaml.
It's a bug, so you should change the prompt inside the Package(graphrag/index/graph/extractors/graph/
Actually it may should be "graphrag/index/graph/extractors/community_reports/", anyway thank you!
Actually it may should be "graphrag/index/graph/extractors/community_reports/", anyway thank you!
I am using: python -m graphrag.index --init --root GraphRAG/
, So, GraphRAG is my local folder which contains the prompts/community_reports.txt which you can edit for customized system message.
took your advice and changed the graphrag/index/graph/extractors/community_reports/ file in my system at location : /opt/conda/lib/python3.10/site-packages/graphrag/index/graph/extractors/community_reports.
now its working for me.
@WeiminLee I did that. but the graph rag still fetching the old summary_report inside the code instead of the local changed summary_report.txt file that is mentioned in my settings.yaml.
It's a bug, so you should change the prompt inside the Package(graphrag/index/graph/extractors/graph/
Is there any chance you could show the original prompt you had and what you then changed it to?
@Archdiner in graphrag/index/graph/extractors/community_reports/ all occurrence of {{ to { and }} to }.
for example:
"title": <report_title>,
"summary": <executive_summary>,
"rating": <impact_severity_rating>,
"rating_explanation": <rating_explanation>,
"findings": [
"explanation": <insight_1_explanation>
"explanation": <insight_2_explanation>
> to following.
"title": <report_title>,
"summary": <executive_summary>,
"rating": <impact_severity_rating>,
"rating_explanation": <rating_explanation>,
"findings": [
"explanation": <insight_1_explanation>
"explanation": <insight_2_explanation>
13:56:11,978 datashaper.workflow.workflow INFO executing verb create_community_reports
13:56:33,992 httpx INFO HTTP Request: POST "HTTP/1.1 200 OK"
13:56:33,998 graphrag.llm.openai.utils ERROR error loading json, json=Here is the output in JSON format:```{ "title": "Baidu Community", "summary": "The Baidu community revolves around Robin Li, the founder and CEO of Baidu, a Chinese technology company focused on AI applications and research. The community's dynamics are shaped by Robin Li's concerns about AI development risks and his leadership role in Baidu.", "rating": 6.0, "rating_explanation": "The impact severity rating is moderate due to the potential influence of Robin Li's views on AI development and Baidu's prominent position in the tech industry.", "findings": [ { "summary": "Robin Li's leadership role in Baidu", "explanation": "Robin Li is the founder, chairman, and CEO of Baidu, indicating his significant influence over the company's direction and decisions. This leadership role is crucial in understanding the community's dynamics [Data: Entities (21); Relationships (24)]." }, { "summary": "Baidu's focus on AI applications and research", "explanation": "Baidu is a Chinese technology company focused on both AI applications and research, suggesting its significant contribution to the development of AI in China. This focus could have implications for the community's dynamics [Data: Entities (22)]." }, { "summary": "Robin Li's concerns about AI development risks", "explanation": "Robin Li is concerned about the risks associated with AI development, which could impact society. This concern suggests that he may be cautious in his approach to AI development and deployment [Data: Relationships (5)]." } ]}```Let me know if you need any further assistance!
Traceback (most recent call last):
File "/data/galen_guo/workspace/LLM-Research/graphrag/graphrag/llm/openai/", line 93, in try_parse_json_object
result = json.loads(input)
File "/home/galen.guo/miniforge3/envs/rag_env/lib/python3.11/json/", line 346, in loads
return _default_decoder.decode(s)
File "/home/galen.guo/miniforge3/envs/rag_env/lib/python3.11/json/", line 337, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/home/galen.guo/miniforge3/envs/rag_env/lib/python3.11/json/", line 355, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
13:56:33,999 graphrag.llm.openai.openai_chat_llm WARNING error parsing llm json, retrying
I am encountering the same issue. I modified prompts/community_report.txt and graphrag/index/graph/extractors/community_reports/, but the changes did not take effect. I ran the command using poetry run poe index --root .. Does this relate to the problem we discussed earlier?
@WeiminLee @Archdiner @Amitabh-Priyadarshi-Bayer Could you please help confirm this? I would be very grateful!
@WeiminLee @Archdiner @Amitabh-Priyadarshi-Bayer
I am having this error:
json.decoder.JSONDecodeError: Expecting ',' delimiter: line 1 column 327 (char 326)
What should I do?
Describe the bug
json={{ "title": "Product Team: Mansz and Jrman"
{{ is giving error. I tried to fix the system message for community report. but I found out the error still persists and when I looked into report then it shows that community_report prompt is setting.yaml, prompt filename for community report is "prompts/community_report.txt" I updated double braces '{{' to single { in "community_report.txt" but it still creating the json with double '{{'.
Also, in indexing-engine.log in "community_reports" section, its showing "prompt": null and not showing the filename as 'prompts/community_report.txt', which is mentioned in setting.yaml
Steps to reproduce
No response
Expected Behavior
No response
GraphRAG Config Used
encoding_model: cl100k_base skip_workflows: [] llm: api_key: ${GRAPHRAG_API_KEY} type: azure_openai_chat # or azure_openai_chat model: gpt-4-32k (0613) model_supports_json: false
max_tokens: 4000
request_timeout: 180.0
api_base: -removed because of security purpose api_version: '2023-05-15'
deployment_name: gpt-4-32k
tokens_per_minute: 150_000 # set a leaky bucket throttle
requests_per_minute: 10_000 # set a leaky bucket throttle
max_retries: 10
max_retry_wait: 10.0
sleep_on_rate_limit_recommendation: true # whether to sleep when azure suggests wait-times
concurrent_requests: 25 # the number of parallel inflight requests that may be made
parallelization: stagger: 0.3
num_threads: 50 # the number of threads to use for parallel processing
async_mode: threaded # or asyncio
parallelization: override the global parallelization settings for embeddings
async_mode: threaded # or asyncio llm: api_key: ${GRAPHRAG_API_KEY} type: azure_openai_embedding model: text-embedding-ada-002 api_base: removed because of security purpose api_version: '2023-05-15'
chunks: size: 300 overlap: 100 group_by_columns: [id] # by default, we don't allow chunks to cross documents
input: type: file # or blob file_type: text # or csv base_dir: "input" file_encoding: utf-8 file_pattern: ".*\.txt$"
cache: type: file # or blob base_dir: "cache"
storage: type: file # or blob base_dir: "output/${timestamp}/artifacts"
reporting: type: file # or console, blob base_dir: "output/${timestamp}/reports"
llm: override the global llm settings for this task
parallelization: override the global parallelization settings for this task
async_mode: override the global async_mode settings for this task
prompt: "prompts/entity_extraction.txt" entity_types: [organization,person,geo,event] max_gleanings: 0
llm: override the global llm settings for this task
parallelization: override the global parallelization settings for this task
async_mode: override the global async_mode settings for this task
prompt: "prompts/summarize_descriptions.txt" max_length: 500
llm: override the global llm settings for this task
parallelization: override the global parallelization settings for this task
async_mode: override the global async_mode settings for this task
enabled: true prompt: "prompts/claim_extraction.txt" description: "Any claims or facts that could be relevant to information discovery." max_gleanings: 0
llm: override the global llm settings for this task
parallelization: override the global parallelization settings for this task
async_mode: override the global async_mode settings for this task
prompt: "prompts/community_report.txt" max_length: 4000 max_input_length: 12000
cluster_graph: max_cluster_size: 10
embed_graph: enabled: false # if true, will generate node2vec embeddings for nodes
num_walks: 10
walk_length: 40
window_size: 2
iterations: 3
random_seed: 597832
umap: enabled: false # if true, will generate UMAP embeddings for nodes
snapshots: graphml: false raw_entities: false top_level_nodes: false
text_unit_prop: 0.5
community_prop: 0.1
conversation_history_max_turns: 5
top_k_mapped_entities: 10
top_k_relationships: 10
max_tokens: 12000
max_tokens: 12000
data_max_tokens: 12000
map_max_tokens: 1000
reduce_max_tokens: 2000
concurrency: 32
Logs and screenshots
20:19:16,982 graphrag.config.read_dotenv INFO Loading pipeline .env file 20:19:16,988 graphrag.index.cli INFO using default configuration: { "llm": { "api_key": "REDACTED, length 32", "type": "azure_openai_chat", "model": "gpt-4-32k (0613)", "max_tokens": 4000, "request_timeout": 180.0, "api_base": "removed because of security purpose", "api_version": "2023-05-15", "proxy": null, "cognitive_services_endpoint": null, "deployment_name": "gpt-4-32k", "model_supports_json": false, "tokens_per_minute": 0, "requests_per_minute": 0, "max_retries": 10, "max_retry_wait": 10.0, "sleep_on_rate_limit_recommendation": true, "concurrent_requests": 25 }, "parallelization": { "stagger": 0.3, "num_threads": 50 }, "async_mode": "threaded", "root_dir": "GraphRAG/", "reporting": { "type": "file", "base_dir": "output/${timestamp}/reports", "storage_account_blob_url": null }, "storage": { "type": "file", "base_dir": "output/${timestamp}/artifacts", "storage_account_blob_url": null }, "cache": { "type": "file", "base_dir": "cache", "storage_account_blob_url": null }, "input": { "type": "file", "file_type": "text", "base_dir": "input", "storage_account_blob_url": null, "encoding": "utf-8", "file_pattern": ".*\.txt$", "file_filter": null, "source_column": null, "timestamp_column": null, "timestamp_format": null, "text_column": "text", "title_column": null, "document_attribute_columns": [] }, "embed_graph": { "enabled": false, "num_walks": 10, "walk_length": 40, "window_size": 2, "iterations": 3, "random_seed": 597832, "strategy": null }, "embeddings": { "llm": { "api_key": "REDACTED, length 32", "type": "azure_openai_embedding", "model": "text-embedding-ada-002", "max_tokens": 4000, "request_timeout": 180.0, "api_base": "removed because of security purpose", "api_version": "2023-05-15", "proxy": null, "cognitive_services_endpoint": null, "deployment_name": "embedding", "model_supports_json": null, "tokens_per_minute": 0, "requests_per_minute": 0, "max_retries": 10, "max_retry_wait": 10.0, "sleep_on_rate_limit_recommendation": true, "concurrent_requests": 25 }, "parallelization": { "stagger": 0.3, "num_threads": 50 }, "async_mode": "threaded", "batch_size": 16, "batch_max_tokens": 8191, "target": "required", "skip": [], "vector_store": null, "strategy": null }, "chunks": { "size": 300, "overlap": 100, "group_by_columns": [ "id" ], "strategy": null }, "snapshots": { "graphml": false, "raw_entities": false, "top_level_nodes": false }, "entity_extraction": { "llm": { "api_key": "REDACTED, length 32", "type": "azure_openai_chat", "model": "gpt-4-32k (0613)", "max_tokens": 4000, "request_timeout": 180.0, "api_base": "removed because of security purpose", "api_version": "2023-05-15", "proxy": null, "cognitive_services_endpoint": null, "deployment_name": "gpt-4-32k", "model_supports_json": false, "tokens_per_minute": 0, "requests_per_minute": 0, "max_retries": 10, "max_retry_wait": 10.0, "sleep_on_rate_limit_recommendation": true, "concurrent_requests": 25 }, "parallelization": { "stagger": 0.3, "num_threads": 50 }, "async_mode": "threaded", "prompt": "prompts/entity_extraction.txt", "entity_types": [ "organization", "person", "geo", "event" ], "max_gleanings": 0, "strategy": null }, "summarize_descriptions": { "llm": { "api_key": "REDACTED, length 32", "type": "azure_openai_chat", "model": "gpt-4-32k (0613)", "max_tokens": 4000, "request_timeout": 180.0, "api_base": "removed because of security purpose", "api_version": "2023-05-15", "proxy": null, "cognitive_services_endpoint": null, "deployment_name": "gpt-4-32k", "model_supports_json": false, "tokens_per_minute": 0, "requests_per_minute": 0, "max_retries": 10, "max_retry_wait": 10.0, "sleep_on_rate_limit_recommendation": true, "concurrent_requests": 25 }, "parallelization": { "stagger": 0.3, "num_threads": 50 }, "async_mode": "threaded", "prompt": "prompts/summarize_descriptions.txt", "max_length": 500, "strategy": null }, "community_reports": { "llm": { "api_key": "REDACTED, length 32", "type": "azure_openai_chat", "model": "gpt-4-32k (0613)", "max_tokens": 4000, "request_timeout": 180.0, "api_base": "removed because of security purpose", "api_version": "2023-05-15", "proxy": null, "cognitive_services_endpoint": null, "deployment_name": "gpt-4-32k", "model_supports_json": false, "tokens_per_minute": 0, "requests_per_minute": 0, "max_retries": 10, "max_retry_wait": 10.0, "sleep_on_rate_limit_recommendation": true, "concurrent_requests": 25 }, "parallelization": { "stagger": 0.3, "num_threads": 50 }, "async_mode": "threaded", "prompt": null, "max_length": 2000, "max_input_length": 8000, "strategy": null }, "claim_extraction": { "llm": { "api_key": "REDACTED, length 32", "type": "azure_openai_chat", "model": "gpt-4-32k (0613)", "max_tokens": 4000, "request_timeout": 180.0, "api_base": "removed because of security purpose", "api_version": "2023-05-15", "proxy": null, "cognitive_services_endpoint": null, "deployment_name": "gpt-4-32k", "model_supports_json": false, "tokens_per_minute": 0, "requests_per_minute": 0, "max_retries": 10, "max_retry_wait": 10.0, "sleep_on_rate_limit_recommendation": true, "concurrent_requests": 25 }, "parallelization": { "stagger": 0.3, "num_threads": 50 }, "async_mode": "threaded", "enabled": true, "prompt": "prompts/claim_extraction.txt", "description": "Any claims or facts that could be relevant to information discovery.", "max_gleanings": 0, "strategy": null }, "cluster_graph": { "max_cluster_size": 10, "strategy": null }, "umap": { "enabled": false }, "local_search": { "text_unit_prop": 0.5, "community_prop": 0.1, "conversation_history_max_turns": 5, "top_k_entities": 10, "top_k_relationships": 10, "max_tokens": 12000, "llm_max_tokens": 2000 }, "global_search": { "max_tokens": 12000, "data_max_tokens": 12000, "map_max_tokens": 1000, "reduce_max_tokens": 2000, "concurrency": 32 }, "encoding_model": "cl100k_base", "skip_workflows": [] }
20:20:39,273 graphrag.index.reporting.file_workflow_callbacks INFO Community Report Extraction Error details=None 20:20:39,273 WARNING No report found for community: 0 20:20:39,346 httpx INFO HTTP Request: POST --" 20:20:39,347 graphrag.llm.openai.utils ERROR error loading json, json={{ "title": "Application Support Team and Controlled Environment", "summary": "The community revolves around the Application Support Team, which provides assistance to users experiencing problems with the application. The team interacts with various features of the application, including the Controlled Environment, Admin Tab, In-app Support Ticket System, Statuses File, and Summary View.", "rating": 7.0, "rating_explanation": "The impact severity rating is high due to the critical role of the Application Support Team in ensuring smooth operation of the application.", "findings": [ {{ "summary": "Functionality of the Summary View", "explanation": "The Summary View is a customizable section of the application where users can adjust the display of information. The Application Support Team can provide assistance for customizing the Summary View, indicating its complexity and potential for user customization. [Data: Entities (26), Relationships (37)]" }} ]}} Traceback (most recent call last): File "/opt/conda/lib/python3.10/site-packages/graphrag/llm/openai/", line 93, in try_parse_json_object result = json.loads(input) File "/opt/conda/lib/python3.10/json/", line 346, in loads return _default_decoder.decode(s) File "/opt/conda/lib/python3.10/json/", line 337, in decode obj, end = self.raw_decode(s, idx=_w(s, 0).end()) File "/opt/conda/lib/python3.10/json/", line 353, in raw_decode obj, end = self.scan_once(s, idx) json.decoder.JSONDecodeError: Expecting property name enclosed in double quotes: line 1 column 2 (char 1) 20:20:39,349 graphrag.llm.openai.openai_chat_llm WARNING error parsing llm json, retrying 20:20:39,978 httpx INFO HTTP Request: POST "HTTP/1.1 200 OK" 20:20:39,980 graphrag.llm.openai.utils ERROR error loading json, json={output_text} Traceback (most recent call last): File "/opt/conda/lib/python3.10/site-packages/graphrag/llm/openai/", line 124, in _manual_json json_output = try_parse_json_object(output) File "/opt/conda/lib/python3.10/site-packages/graphrag/llm/openai/", line 93, in try_parse_json_object result = json.loads(input) File "/opt/conda/lib/python3.10/json/", line 346, in loads return _default_decoder.decode(s) File "/opt/conda/lib/python3.10/json/", line 337, in decode obj, end = self.raw_decode(s, idx=_w(s, 0).end()) File "/opt/conda/lib/python3.10/json/", line 353, in raw_decode obj, end = self.scan_once(s, idx) json.decoder.JSONDecodeError: Expecting property name enclosed in double quotes: line 1 column 2 (char 1)
During handling of the above exception, another exception occurred:
Additional Information