microsoft / graphrag

A modular graph-based Retrieval-Augmented Generation (RAG) system
https://microsoft.github.io/graphrag/
MIT License
13.82k stars 1.19k forks source link

Error executing verb "cluster_graph" in create_base_entity_graph: Columns must be same length as key #414

Closed 451222664 closed 1 week ago

451222664 commented 3 weeks ago

This is my configuration:

encoding_model: cl100k_base
skip_workflows: []
llm:
  api_key: lm-studio
  type: openai_chat # or azure_openai_chat
  model: bartowski/gemma-2-9b-it-GGUF/gemma-2-9b-it-Q6_K-Q8.gguf
  model_supports_json: true # recommended if this is available for your model.
  # max_tokens: 4000
  # request_timeout: 180.0
  api_base: http://localhost:1234/v1

parallelization:
  stagger: 0.3

async_mode: threaded # or asyncio

embeddings:
  async_mode: threaded # or asyncio
  llm:
    api_key: lm-studio
    type: openai_embedding # or azure_openai_embedding
    model: nomic-ai/nomic-embed-text-v1.5-GGUF/nomic-embed-text-v1.5.Q8_0.gguf
    api_base: http://localhost:1234/v1

This is my error log:

00:17:52,468 datashaper.workflow.workflow ERROR Error executing verb "cluster_graph" in create_base_entity_graph: Columns must be same length as key
Traceback (most recent call last):
  File "/opt/anaconda3/lib/python3.11/site-packages/datashaper/workflow/workflow.py", line 410, in _execute_verb
    result = node.verb.func(**verb_args)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/anaconda3/lib/python3.11/site-packages/graphrag/index/verbs/graph/clustering/cluster_graph.py", line 102, in cluster_graph
    output_df[[level_to, to]] = pd.DataFrame(
    ~~~~~~~~~^^^^^^^^^^^^^^^^
  File "/opt/anaconda3/lib/python3.11/site-packages/pandas/core/frame.py", line 4299, in __setitem__
    self._setitem_array(key, value)
  File "/opt/anaconda3/lib/python3.11/site-packages/pandas/core/frame.py", line 4341, in _setitem_array
    check_key_length(self.columns, key, value)
  File "/opt/anaconda3/lib/python3.11/site-packages/pandas/core/indexers/utils.py", line 390, in check_key_length
    raise ValueError("Columns must be same length as key")
ValueError: Columns must be same length as key
00:17:52,469 graphrag.index.reporting.file_workflow_callbacks INFO Error executing verb "cluster_graph" in create_base_entity_graph: Columns must be same length as key details=None
00:17:52,469 graphrag.index.run ERROR error running workflow create_base_entity_graph
Traceback (most recent call last):
  File "/opt/anaconda3/lib/python3.11/site-packages/graphrag/index/run.py", line 323, in run_pipeline
    result = await workflow.run(context, callbacks)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/anaconda3/lib/python3.11/site-packages/datashaper/workflow/workflow.py", line 369, in run
    timing = await self._execute_verb(node, context, callbacks)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/anaconda3/lib/python3.11/site-packages/datashaper/workflow/workflow.py", line 410, in _execute_verb
    result = node.verb.func(**verb_args)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/anaconda3/lib/python3.11/site-packages/graphrag/index/verbs/graph/clustering/cluster_graph.py", line 102, in cluster_graph
    output_df[[level_to, to]] = pd.DataFrame(
    ~~~~~~~~~^^^^^^^^^^^^^^^^
  File "/opt/anaconda3/lib/python3.11/site-packages/pandas/core/frame.py", line 4299, in __setitem__
    self._setitem_array(key, value)
  File "/opt/anaconda3/lib/python3.11/site-packages/pandas/core/frame.py", line 4341, in _setitem_array
    check_key_length(self.columns, key, value)
  File "/opt/anaconda3/lib/python3.11/site-packages/pandas/core/indexers/utils.py", line 390, in check_key_length
    raise ValueError("Columns must be same length as key")
ValueError: Columns must be same length as key
00:17:52,470 graphrag.index.reporting.file_workflow_callbacks INFO Error running pipeline! details=None

This is my console log:

🚀 Reading settings from ragtest/settings.yaml
/opt/anaconda3/lib/python3.11/site-packages/numpy/core/fromnumeric.py:59:
FutureWarning: 'DataFrame.swapaxes' is deprecated and will be removed in a
future version. Please use 'DataFrame.transpose' instead.
  return bound(*args, **kwds)
🚀 create_base_text_units
                                 id  ... n_tokens
0  4d58d18fc8bedcf601e27bb07cdc3f8e  ...      300
1  288d3e4ebc58510cc7153d89f5946a5f  ...      300
2  a13a2f2347995e03c804450b08354b12  ...      208
3  d53faf2c8abaa7cd58e253d514fe6ad3  ...        8

[4 rows x 5 columns]
🚀 create_base_extracted_entities
                                        entity_graph
0  <graphml xmlns="http://graphml.graphdrawing.or...
🚀 create_summarized_entities
                                        entity_graph
0  <graphml xmlns="http://graphml.graphdrawing.or...
❌ create_base_entity_graph
None
⠴ GraphRAG Indexer
├── Loading Input (InputFileType.text) - 1 files loaded (1 filtered) ━ 100% … 0…
├── create_base_text_units
├── create_base_extracted_entities
├── create_summarized_entities
└── create_base_entity_graph
❌ Errors occurred during the pipeline run, see logs for more details.
AlonsoGuevara commented 3 weeks ago

Hi! Can you please check in your cache files or output files if the entity extraction was succesful? Most errors on the clustering step relate to faulty entity extractions, either by 0 extracted entities or by wrong responses from the ll..

451222664 commented 3 weeks ago

It means that there is something wrong with the result of LLM processing, right?

"<|COMPLETE|> 

Let me know if you'd like to try another example!  I'm ready when you are."
Nuclear6 commented 3 weeks ago

This error should be caused by your embedding or model not loading correctly. You can refer to my configuration modification.

image

image image
AnandMoorthy commented 3 weeks ago

Hi @451222664

I am also getting same error!

Pasted the logs below, feels like an issue with ollama. Please confirm you are also getting same logs.

Screenshot 2024-07-08 214205

AlonsoGuevara commented 3 weeks ago

Hi @451222664 By the response provided, yup, the LLM you're using is ignoring the format we are looking for in the output and it is being more "chatty". I would suggest doing some prompt tuning to try to force the LLM into the format we need for parsing.

AnandMoorthy commented 3 weeks ago

Hi @451222664

I am also getting same error!

Pasted the logs below, feels like an issue with ollama. Please confirm you are also getting same logs.

Screenshot 2024-07-08 214205

It turns out ollama was not started properly, restarting the service fixed the issue.

AlonsoGuevara commented 1 week ago

Hi! We are consolidating alternate model issues here: https://github.com/microsoft/graphrag/issues/657