severian42 / GraphRAG-Local-UI

GraphRAG using Local LLMs - Features robust API and multiple apps for Indexing/Prompt Tuning/Query/Chat/Visualizing/Etc. This is meant to be the ultimate GraphRAG/KG local LLM app.
MIT License
1.51k stars 173 forks source link

Error executing verb "cluster_graph" in create_base_entity_graph: Columns must be same length as key #51

Open sdx0112 opened 1 month ago

sdx0112 commented 1 month ago

Hi, I have pulled the latest version and encountered the following error:

09:21:37,332 datashaper.workflow.workflow ERROR Error executing verb "cluster_graph" in create_base_entity_graph: Columns must be same length as key Traceback (most recent call last): File "D:\miniforge3\envs\graphrag\Lib\site-packages\datashaper\workflow\workflow.py", line 410, in _execute_verb result = node.verb.func(**verb_args) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\Projects\GPT\GraphRAG-Local\GraphRAG-Local-UI\graphrag\index\verbs\graph\clustering\cluster_graph.py", line 102, in cluster_graph output_df[[level_to, to]] = pd.DataFrame(


  File "D:\miniforge3\envs\graphrag\Lib\site-packages\pandas\core\frame.py", line 4299, in __setitem__
    self._setitem_array(key, value)
  File "D:\miniforge3\envs\graphrag\Lib\site-packages\pandas\core\frame.py", line 4341, in _setitem_array
    check_key_length(self.columns, key, value)
  File "D:\miniforge3\envs\graphrag\Lib\site-packages\pandas\core\indexers\utils.py", line 390, in check_key_length
    raise ValueError("Columns must be same length as key")
ValueError: Columns must be same length as key
09:21:37,383 graphrag.index.reporting.file_workflow_callbacks INFO Error executing verb "cluster_graph" in create_base_entity_graph: Columns must be same length as key details=None
09:21:37,383 root ERROR Error running workflow create_base_entity_graph: Columns must be same length as key
09:21:37,555 graphrag.index.run INFO Running workflow: create_final_entities...
09:21:37,556 graphrag.index.run INFO dependencies for create_final_entities: ['create_base_entity_graph']
09:21:37,558 graphrag.index.run ERROR error running workflow create_final_entities
Traceback (most recent call last):
  File "D:\Projects\GPT\GraphRAG-Local\GraphRAG-Local-UI\graphrag\index\run.py", line 320, in run_pipeline
    await inject_workflow_data_dependencies(workflow)
  File "D:\Projects\GPT\GraphRAG-Local\GraphRAG-Local-UI\graphrag\index\run.py", line 256, in inject_workflow_data_dependencies
    table = await load_table_from_storage(f"{id}.parquet")
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\Projects\GPT\GraphRAG-Local\GraphRAG-Local-UI\graphrag\index\run.py", line 242, in load_table_from_storage
    raise ValueError(msg)
ValueError: Could not find create_base_entity_graph.parquet in storage!
09:21:37,570 graphrag.index.reporting.file_workflow_callbacks INFO Error running pipeline! details=None
sdx0112 commented 1 month ago

I looked into the cluster_graph.py code. Here is the code causing the issue from line 102:

    output_df[[level_to, to]] = pd.DataFrame(
        output_df[to].tolist(), index=output_df.index
    )

The left side has two columns while the right side has only one column.

sdx0112 commented 1 month ago

I also printed out the output_df. Seems it has only one row, and only the first column 'entity_graph' has value. image

myyourgit commented 1 month ago

I looked into the cluster_graph.py code. Here is the code causing the issue from line 102:

    output_df[[level_to, to]] = pd.DataFrame(
        output_df[to].tolist(), index=output_df.index
    )

The left side has two columns while the right side has only one column.

So how to resolve this issue? Thanks

gaostar123 commented 1 month ago

I have the same problem

JamesCanniffe commented 1 month ago

I'm also getting the same issue when I run this code locally, but not when I run it on google colab.

severian42 commented 1 month ago

Hey! I've been diving into this, seems to have to do with the way the models are being called with the cache functions. This is part of the library itself and so I'm trying to figure out a better way to handle it locally. Right now it seems like clearing the cache in the indexing dir you initialize helps the issue. Still going to keep working on making this more stable and a non-issue. Thanks for your patience!

KDD2018 commented 1 month ago

I am getting the same issue! And I tried to clean the cache in the indexing dir, it did not work. And then I tried to excute 'python -m graphrag.index --init --root ./indexing/', it raised another error: ValueError: Project already initialized at indexing. So how to resolve it?