BovineOverlord commented 2 weeks ago

Describe the bug

{"type": "error", "data": "Error executing verb \"cluster_graph\" in create_base_entity_graph: Columns must be same length as key", "stack": "Traceback (most recent call last):\n File \"C:\Program Files\Python310\lib\site-packages\datashaper\workflow\workflow.py\", line 410, in _execute_verb\n result = node.verb.func(verb_args)\n File \"C:\Program Files\Python310\lib\site-packages\graphrag\index\verbs\graph\clustering\cluster_graph.py\", line 102, in cluster_graph\n output_df[[level_to, to]] = pd.DataFrame(\n File \"C:\Program Files\Python310\lib\site-packages\pandas\core\frame.py\", line 4299, in setitem\n self._setitem_array(key, value)\n File \"C:\Program Files\Python310\lib\site-packages\pandas\core\frame.py\", line 4341, in _setitem_array\n check_key_length(self.columns, key, value)\n File \"C:\Program Files\Python310\lib\site-packages\pandas\core\indexers\utils.py\", line 390, in check_key_length\n raise ValueError(\"Columns must be same length as key\")\nValueError: Columns must be same length as key\n", "source": "Columns must be same length as key", "details": null} {"type": "error", "data": "Error running pipeline!", "stack": "Traceback (most recent call last):\n File \"C:\Program Files\Python310\lib\site-packages\graphrag\index\run.py\", line 323, in run_pipeline\n result = await workflow.run(context, callbacks)\n File \"C:\Program Files\Python310\lib\site-packages\datashaper\workflow\workflow.py\", line 369, in run\n timing = await self._execute_verb(node, context, callbacks)\n File \"C:\Program Files\Python310\lib\site-packages\datashaper\workflow\workflow.py\", line 410, in _execute_verb\n result = node.verb.func(verb_args)\n File \"C:\Program Files\Python310\lib\site-packages\graphrag\index\verbs\graph\clustering\cluster_graph.py\", line 102, in cluster_graph\n output_df[[level_to, to]] = pd.DataFrame(\n File \"C:\Program Files\Python310\lib\site-packages\pandas\core\frame.py\", line 4299, in setitem\n self._setitem_array(key, value)\n File \"C:\Program Files\Python310\lib\site-packages\pandas\core\frame.py\", line 4341, in _setitem_array\n check_key_length(self.columns, key, value)\n File \"C:\Program Files\Python310\lib\site-packages\pandas\core\indexers\utils.py\", line 390, in check_key_length\n raise ValueError(\"Columns must be same length as key\")\nValueError: Columns must be same length as key\n", "source": "Columns must be same length as key", "details": null}

Steps to reproduce

I was using a local ollama model to use the tool. It ran fine and loaded the test file before the error occurred.

Expected Behavior

The tool should have proceeded with the following step "create_base_text_units" rather than cease operation. It appears to be a bug with the graphing function.

GraphRAG Config Used

encoding_model: cl100k_base skip_workflows: [] llm: api_key: ${GRAPHRAG_API_KEY} type: openai_chat # or azure_openai_chat model: command-r-plus:104b-q4_0 model_supports_json: true # recommended if this is available for your model.

max_tokens: 2000

request_timeout: 180.0

api_base: http://localhost:11434/v1

api_version: 2024-02-15-preview

organization:

deployment_name:

tokens_per_minute: 150_000 # set a leaky bucket throttle

requests_per_minute: 10_000 # set a leaky bucket throttle

max_retries: 1

max_retry_wait: 10.0

sleep_on_rate_limit_recommendation: true # whether to sleep when azure suggests wait-times

concurrent_requests: 1 # the number of parallel inflight requests that may be made

parallelization: stagger: 0.3

num_threads: 50 # the number of threads to use for parallel processing

async_mode: threaded # or asyncio

embeddings:

parallelization: override the global parallelization settings for embeddings

async_mode: threaded # or asyncio llm: api_key: ${GRAPHRAG_API_KEY} type: openai_embedding # or azure_openai_embedding model: qwen2:7b-instruct

api_base: http://localhost:11434/api

# api_version: 2024-02-15-preview
# organization: <organization_id>
# deployment_name: <azure_model_deployment_name>
# tokens_per_minute: 150_000 # set a leaky bucket throttle
# requests_per_minute: 10_000 # set a leaky bucket throttle
# max_retries: 1
# max_retry_wait: 10.0
# sleep_on_rate_limit_recommendation: true # whether to sleep when azure suggests wait-times
# concurrent_requests: 1 # the number of parallel inflight requests that may be made
# batch_size: 1 # the number of documents to send in a single request
# batch_max_tokens: 8191 # the maximum number of tokens to send in a single request
# target: required # or optional

No change to the remainder

Logs and screenshots

Additional Information

GraphRAG Version: Current of this posting
Operating System: Windows 10
Python Version: 3.10
Related Issues:

AlonsoGuevara commented 2 weeks ago

Hi! My general rule of thumb when facing this issues is:

Check the outputs of the entity extraction, this will show if the graph is empty
If the graph is empty, then it can be either faulty llm responses (unparseable) or, LLM calling failures

Can you please check your cache entries for Entity Extraction to check if the LLM is providing faulty responses?

BovineOverlord commented 2 weeks ago

Entity extraction directory is empty. I attempted with 2 other different models and was met with the same result.