Open as1078 opened 3 weeks ago
HI @as1078
It seems entity extraction process failed and yield an empty graph. Could you please share your log file?
Sure, since my log file is too large to upload, I have uploaded a portion of it here It seems that all other errors besides the clustering one were rate limit errors, which I thought were dealt with by GraphRAG through waiting before submitting another API request. I excluded the clustering errors that were put in above. logs.json
I have this error too. I noticed that my generated prompts were missing a )
in the entity extraction.
Just noticed I had the same issue. Thanks!
On less performant models like the phi-3 https://github.com/microsoft/graphrag/pull/503 was able to repair the json. I did not test with prompt rewrite.
I have this error too. I noticed that my generated prompts were missing a
)
in the entity extraction.
I still have the same error after I fix my generated prompts for entity extraction, does anyone know what might be the cause?
This issue has been marked stale due to inactivity after repo maintainer or community member responses that request more information or suggest a solution. It will be closed after five additional days.
Describe the issue
I got an empty network when doing the Leiden clustering algorithm as follows: {"type": "error", "data": "Error executing verb \"cluster_graph\" in create_base_entity_graph: EmptyNetworkError", "stack": ... leiden.EmptyNetworkError: EmptyNetworkError\n", "source": "EmptyNetworkError", "details": null}
When opening my parquet files for each step in pandas, there is only an entity_graph column with an incomplete graphml URL. I saw on other posts that there should also be a clustered_graph column, but there is none for me. When I look in the cache directory however, both entity_extraction and summarize_descriptions have valid JSON results, so I'm not sure how exactly the graph became empty. My data is a set of .txt files of US Congressional hearings, and I previously used the prompt autotune feature to customize prompts to my data.
Steps to reproduce
joint-20240710T193325Z-001.zip To generate results, I simply ran the init command followed by
!python -m graphrag.prompt_tune --root ./ragtest --domain "US congress hearings"
and then!python -m graphrag.index --verbose --root ./ragtest
GraphRAG Config Used
encoding_model: cl100k_base skip_workflows: [] llm: api_key: ${GRAPHRAG_API_KEY} type: openai_chat # or azure_openai_chat model: gpt-4-turbo-preview model_supports_json: true # recommended if this is available for your model.
max_tokens: 4000
request_timeout: 180.0
api_base: https://.openai.azure.com
api_version: 2024-02-15-preview
organization:
deployment_name:
tokens_per_minute: 150_000 # set a leaky bucket throttle
requests_per_minute: 10_000 # set a leaky bucket throttle
max_retries: 10
max_retry_wait: 10.0
sleep_on_rate_limit_recommendation: true # whether to sleep when azure suggests wait-times
concurrent_requests: 25 # the number of parallel inflight requests that may be made
parallelization: stagger: 0.3
num_threads: 50 # the number of threads to use for parallel processing
async_mode: threaded # or asyncio
embeddings:
parallelization: override the global parallelization settings for embeddings
async_mode: threaded # or asyncio llm: api_key: ${GRAPHRAG_API_KEY} type: openai_embedding # or azure_openai_embedding model: text-embedding-3-small
api_base: https://.openai.azure.com
chunks: size: 300 overlap: 100 group_by_columns: [id] # by default, we don't allow chunks to cross documents
input: type: file # or blob file_type: text # or csv base_dir: "input" file_encoding: utf-8 file_pattern: ".*\.txt$"
cache: type: file # or blob base_dir: "cache"
connection_string:
container_name:
storage: type: file # or blob base_dir: "output/${timestamp}/artifacts"
connection_string:
container_name:
reporting: type: file # or console, blob base_dir: "output/${timestamp}/reports"
connection_string:
container_name:
entity_extraction:
llm: override the global llm settings for this task
parallelization: override the global parallelization settings for this task
async_mode: override the global async_mode settings for this task
prompt: "prompts/entity_extraction.txt" entity_types: [organization,person,geo,event] max_gleanings: 0
summarize_descriptions:
llm: override the global llm settings for this task
parallelization: override the global parallelization settings for this task
async_mode: override the global async_mode settings for this task
prompt: "prompts/summarize_descriptions.txt" max_length: 500
claim_extraction:
llm: override the global llm settings for this task
parallelization: override the global parallelization settings for this task
async_mode: override the global async_mode settings for this task
enabled: true
prompt: "prompts/claim_extraction.txt" description: "Any claims or facts that could be relevant to information discovery." max_gleanings: 0
community_report:
llm: override the global llm settings for this task
parallelization: override the global parallelization settings for this task
async_mode: override the global async_mode settings for this task
prompt: "prompts/community_report.txt" max_length: 2000 max_input_length: 8000
cluster_graph: max_cluster_size: 10
embed_graph: enabled: false # if true, will generate node2vec embeddings for nodes
num_walks: 10
walk_length: 40
window_size: 2
iterations: 3
random_seed: 597832
umap: enabled: false # if true, will generate UMAP embeddings for nodes
snapshots: graphml: false raw_entities: false top_level_nodes: false
local_search:
text_unit_prop: 0.5
community_prop: 0.1
conversation_history_max_turns: 5
top_k_mapped_entities: 10
top_k_relationships: 10
max_tokens: 12000
global_search:
max_tokens: 12000
data_max_tokens: 12000
map_max_tokens: 1000
reduce_max_tokens: 2000
concurrency: 32
Logs and screenshots
Logs.json {"type": "error", "data": "Error executing verb \"cluster_graph\" in create_base_entity_graph: EmptyNetworkError", "stack": "Traceback (most recent call last):\n File \"/Users/amansingh/anaconda3/lib/python3.11/site-packages/datashaper/workflow/workflow.py\", line 410, in _execute_verb\n result = node.verb.func(verb_args)\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File \"/Users/amansingh/anaconda3/lib/python3.11/site-packages/graphrag/index/verbs/graph/clustering/cluster_graph.py\", line 61, in cluster_graph\n results = output_df[column].apply(lambda graph: run_layout(strategy, graph))\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File \"/Users/amansingh/anaconda3/lib/python3.11/site-packages/pandas/core/series.py\", line 4924, in apply\n ).apply()\n ^^^^^^^\n File \"/Users/amansingh/anaconda3/lib/python3.11/site-packages/pandas/core/apply.py\", line 1427, in apply\n return self.apply_standard()\n ^^^^^^^^^^^^^^^^^^^^^\n File \"/Users/amansingh/anaconda3/lib/python3.11/site-packages/pandas/core/apply.py\", line 1507, in apply_standard\n mapped = obj._map_values(\n ^^^^^^^^^^^^^^^^\n File \"/Users/amansingh/anaconda3/lib/python3.11/site-packages/pandas/core/base.py\", line 921, in _map_values\n return algorithms.map_array(arr, mapper, na_action=na_action, convert=convert)\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File \"/Users/amansingh/anaconda3/lib/python3.11/site-packages/pandas/core/algorithms.py\", line 1743, in map_array\n return lib.map_infer(values, mapper, convert=convert)\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File \"lib.pyx\", line 2972, in pandas._libs.lib.map_infer\n File \"/Users/amansingh/anaconda3/lib/python3.11/site-packages/graphrag/index/verbs/graph/clustering/cluster_graph.py\", line 61, in\n results = output_df[column].apply(lambda graph: run_layout(strategy, graph))\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File \"/Users/amansingh/anaconda3/lib/python3.11/site-packages/graphrag/index/verbs/graph/clustering/cluster_graph.py\", line 167, in run_layout\n clusters = run_leiden(graph, strategy)\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File \"/Users/amansingh/anaconda3/lib/python3.11/site-packages/graphrag/index/verbs/graph/clustering/strategies/leiden.py\", line 26, in run\n node_id_to_community_map = _compute_leiden_communities(\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File \"/Users/amansingh/anaconda3/lib/python3.11/site-packages/graphrag/index/verbs/graph/clustering/strategies/leiden.py\", line 61, in _compute_leiden_communities\n community_mapping = hierarchical_leiden(\n ^^^^^^^^^^^^^^^^^^^^\n File \"<@beartype(graspologic.partition.leiden.hierarchical_leiden) at 0x32b439d00>\", line 304, in hierarchical_leiden\n File \"/Users/amansingh/anaconda3/lib/python3.11/site-packages/graspologic/partition/leiden.py\", line 588, in hierarchical_leiden\n hierarchical_clusters_native = gn.hierarchical_leiden(\n ^^^^^^^^^^^^^^^^^^^^^^^\nleiden.EmptyNetworkError: EmptyNetworkError\n", "source": "EmptyNetworkError", "details": null}
{"type": "error", "data": "Error running pipeline!", "stack": "Traceback (most recent call last):\n File \"/Users/amansingh/anaconda3/lib/python3.11/site-packages/graphrag/index/run.py\", line 323, in run_pipeline\n result = await workflow.run(context, callbacks)\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File \"/Users/amansingh/anaconda3/lib/python3.11/site-packages/datashaper/workflow/workflow.py\", line 369, in run\n timing = await self._execute_verb(node, context, callbacks)\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File \"/Users/amansingh/anaconda3/lib/python3.11/site-packages/datashaper/workflow/workflow.py\", line 410, in _execute_verb\n result = node.verb.func( verb_args)\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File \"/Users/amansingh/anaconda3/lib/python3.11/site-packages/graphrag/index/verbs/graph/clustering/cluster_graph.py\", line 61, in cluster_graph\n results = output_df[column].apply(lambda graph: run_layout(strategy, graph))\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File \"/Users/amansingh/anaconda3/lib/python3.11/site-packages/pandas/core/series.py\", line 4924, in apply\n ).apply()\n ^^^^^^^\n File \"/Users/amansingh/anaconda3/lib/python3.11/site-packages/pandas/core/apply.py\", line 1427, in apply\n return self.apply_standard()\n ^^^^^^^^^^^^^^^^^^^^^\n File \"/Users/amansingh/anaconda3/lib/python3.11/site-packages/pandas/core/apply.py\", line 1507, in apply_standard\n mapped = obj._map_values(\n ^^^^^^^^^^^^^^^^\n File \"/Users/amansingh/anaconda3/lib/python3.11/site-packages/pandas/core/base.py\", line 921, in _map_values\n return algorithms.map_array(arr, mapper, na_action=na_action, convert=convert)\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File \"/Users/amansingh/anaconda3/lib/python3.11/site-packages/pandas/core/algorithms.py\", line 1743, in map_array\n return lib.map_infer(values, mapper, convert=convert)\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File \"lib.pyx\", line 2972, in pandas._libs.lib.map_infer\n File \"/Users/amansingh/anaconda3/lib/python3.11/site-packages/graphrag/index/verbs/graph/clustering/cluster_graph.py\", line 61, in \n results = output_df[column].apply(lambda graph: run_layout(strategy, graph))\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File \"/Users/amansingh/anaconda3/lib/python3.11/site-packages/graphrag/index/verbs/graph/clustering/cluster_graph.py\", line 167, in run_layout\n clusters = run_leiden(graph, strategy)\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File \"/Users/amansingh/anaconda3/lib/python3.11/site-packages/graphrag/index/verbs/graph/clustering/strategies/leiden.py\", line 26, in run\n node_id_to_community_map = _compute_leiden_communities(\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File \"/Users/amansingh/anaconda3/lib/python3.11/site-packages/graphrag/index/verbs/graph/clustering/strategies/leiden.py\", line 61, in _compute_leiden_communities\n community_mapping = hierarchical_leiden(\n ^^^^^^^^^^^^^^^^^^^^\n File \"<@beartype(graspologic.partition.leiden.hierarchical_leiden) at 0x32b439d00>\", line 304, in hierarchical_leiden\n File \"/Users/amansingh/anaconda3/lib/python3.11/site-packages/graspologic/partition/leiden.py\", line 588, in hierarchical_leiden\n hierarchical_clusters_native = gn.hierarchical_leiden(\n ^^^^^^^^^^^^^^^^^^^^^^^\nleiden.EmptyNetworkError: EmptyNetworkError\n", "source": "EmptyNetworkError", "details": null}
Additional Information