microsoft / graphrag

A modular graph-based Retrieval-Augmented Generation (RAG) system
https://microsoft.github.io/graphrag/
MIT License
12.99k stars 1.09k forks source link

ValueError(\"Columns must be same length as key\")\nValueError: Columns must be same length as key\n", "source": "Columns must be same length as key", "details": null} #455

Closed BovineOverlord closed 3 days ago

BovineOverlord commented 2 weeks ago

Describe the bug

{"type": "error", "data": "Error executing verb \"cluster_graph\" in create_base_entity_graph: Columns must be same length as key", "stack": "Traceback (most recent call last):\n File \"C:\Program Files\Python310\lib\site-packages\datashaper\workflow\workflow.py\", line 410, in _execute_verb\n result = node.verb.func(verb_args)\n File \"C:\Program Files\Python310\lib\site-packages\graphrag\index\verbs\graph\clustering\cluster_graph.py\", line 102, in cluster_graph\n output_df[[level_to, to]] = pd.DataFrame(\n File \"C:\Program Files\Python310\lib\site-packages\pandas\core\frame.py\", line 4299, in setitem\n self._setitem_array(key, value)\n File \"C:\Program Files\Python310\lib\site-packages\pandas\core\frame.py\", line 4341, in _setitem_array\n check_key_length(self.columns, key, value)\n File \"C:\Program Files\Python310\lib\site-packages\pandas\core\indexers\utils.py\", line 390, in check_key_length\n raise ValueError(\"Columns must be same length as key\")\nValueError: Columns must be same length as key\n", "source": "Columns must be same length as key", "details": null} {"type": "error", "data": "Error running pipeline!", "stack": "Traceback (most recent call last):\n File \"C:\Program Files\Python310\lib\site-packages\graphrag\index\run.py\", line 323, in run_pipeline\n result = await workflow.run(context, callbacks)\n File \"C:\Program Files\Python310\lib\site-packages\datashaper\workflow\workflow.py\", line 369, in run\n timing = await self._execute_verb(node, context, callbacks)\n File \"C:\Program Files\Python310\lib\site-packages\datashaper\workflow\workflow.py\", line 410, in _execute_verb\n result = node.verb.func(verb_args)\n File \"C:\Program Files\Python310\lib\site-packages\graphrag\index\verbs\graph\clustering\cluster_graph.py\", line 102, in cluster_graph\n output_df[[level_to, to]] = pd.DataFrame(\n File \"C:\Program Files\Python310\lib\site-packages\pandas\core\frame.py\", line 4299, in setitem\n self._setitem_array(key, value)\n File \"C:\Program Files\Python310\lib\site-packages\pandas\core\frame.py\", line 4341, in _setitem_array\n check_key_length(self.columns, key, value)\n File \"C:\Program Files\Python310\lib\site-packages\pandas\core\indexers\utils.py\", line 390, in check_key_length\n raise ValueError(\"Columns must be same length as key\")\nValueError: Columns must be same length as key\n", "source": "Columns must be same length as key", "details": null}

Steps to reproduce

I was using a local ollama model to use the tool. It ran fine and loaded the test file before the error occurred.

Expected Behavior

The tool should have proceeded with the following step "create_base_text_units" rather than cease operation. It appears to be a bug with the graphing function.

GraphRAG Config Used

encoding_model: cl100k_base skip_workflows: [] llm: api_key: ${GRAPHRAG_API_KEY} type: openai_chat # or azure_openai_chat model: command-r-plus:104b-q4_0 model_supports_json: true # recommended if this is available for your model.

max_tokens: 2000

request_timeout: 180.0

api_base: http://localhost:11434/v1

api_version: 2024-02-15-preview

organization:

deployment_name:

tokens_per_minute: 150_000 # set a leaky bucket throttle

requests_per_minute: 10_000 # set a leaky bucket throttle

max_retries: 1

max_retry_wait: 10.0

sleep_on_rate_limit_recommendation: true # whether to sleep when azure suggests wait-times

concurrent_requests: 1 # the number of parallel inflight requests that may be made

parallelization: stagger: 0.3

num_threads: 50 # the number of threads to use for parallel processing

async_mode: threaded # or asyncio

embeddings:

parallelization: override the global parallelization settings for embeddings

async_mode: threaded # or asyncio llm: api_key: ${GRAPHRAG_API_KEY} type: openai_embedding # or azure_openai_embedding model: qwen2:7b-instruct

api_base: http://localhost:11434/api

# api_version: 2024-02-15-preview
# organization: <organization_id>
# deployment_name: <azure_model_deployment_name>
# tokens_per_minute: 150_000 # set a leaky bucket throttle
# requests_per_minute: 10_000 # set a leaky bucket throttle
# max_retries: 1
# max_retry_wait: 10.0
# sleep_on_rate_limit_recommendation: true # whether to sleep when azure suggests wait-times
# concurrent_requests: 1 # the number of parallel inflight requests that may be made
# batch_size: 1 # the number of documents to send in a single request
# batch_max_tokens: 8191 # the maximum number of tokens to send in a single request
# target: required # or optional

No change to the remainder

Logs and screenshots

error

Additional Information

AlonsoGuevara commented 2 weeks ago

Hi! My general rule of thumb when facing this issues is:

Can you please check your cache entries for Entity Extraction to check if the LLM is providing faulty responses?

BovineOverlord commented 2 weeks ago

Entity extraction directory is empty. I attempted with 2 other different models and was met with the same result.

zubu007 commented 2 weeks ago

Facing the same thing. cache/entity_extraction is empty. same exact error in the logs.

huangyuanzhuo-coder commented 2 weeks ago

same error

flikeok commented 2 weeks ago

same error

menghongtao commented 2 weeks ago

same error

CyanMystery commented 1 week ago

same error:

this is my indexing-engine.log: indexing-engine.log

Xls1994 commented 1 week ago

same error: this is my log: indexing-engine.log

The entity_extraction directory is not empty.

image

BochenYIN commented 1 week ago

same error, Entity extraction directory is empty.

chenfujv commented 1 week ago

same error: But entity_extraction directory is not empty. image

chenfujv commented 1 week ago

settings.yaml image

Bai1026 commented 1 week ago

same error lol But entity_extraction and summarize_descriptions directories are also not empty.

yinjianjie commented 1 week ago

same error why

yurochang commented 1 week ago

same problem.

ayanjiushishuai commented 4 days ago

+1

kiljos commented 3 days ago

+1

natoverse commented 3 days ago

Consolidating alternate model issues here: #657