severian42 / GraphRAG-Local-UI

GraphRAG using Local LLMs - Features robust API and multiple apps for Indexing/Prompt Tuning/Query/Chat/Visualizing/Etc. This is meant to be the ultimate GraphRAG/KG local LLM app.
MIT License
1.75k stars 207 forks source link

BUG: Duplicate column names found: ['level_final', 'level_final', 'entity_graph_final'] #36

Closed thusinh1969 closed 4 months ago

thusinh1969 commented 4 months ago
11:46:46,791 graphrag.index.run INFO Workflow create_base_entity_graph completed with 1 rows in 0.01 seconds
11:46:46,791 graphrag.index.emit.parquet_table_emitter INFO emitting parquet table create_base_entity_graph.parquet
11:46:46,791 graphrag.index.run ERROR error running workflow create_base_entity_graph
Traceback (most recent call last):
  File "/Users/steve/AI/GraphRAG-Local-UI/graphrag/index/run.py", line 361, in run_pipeline
    output = await emit_workflow_output(workflow)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/steve/AI/GraphRAG-Local-UI/graphrag/index/run.py", line 320, in emit_workflow_output
    await emitter.emit(workflow.name, output)
  File "/Users/steve/AI/GraphRAG-Local-UI/graphrag/index/emit/parquet_table_emitter.py", line 40, in emit
    await self._storage.set(filename, data.to_parquet())
                                      ^^^^^^^^^^^^^^^^^
  File "/Users/steve/steve_python3.11_env/lib/python3.11/site-packages/pandas/util/_decorators.py", line 333, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/Users/steve/steve_python3.11_env/lib/python3.11/site-packages/pandas/core/frame.py", line 3113, in to_parquet
    return to_parquet(
           ^^^^^^^^^^^
  File "/Users/steve/steve_python3.11_env/lib/python3.11/site-packages/pandas/io/parquet.py", line 480, in to_parquet
    impl.write(
  File "/Users/steve/steve_python3.11_env/lib/python3.11/site-packages/pandas/io/parquet.py", line 190, in write
    table = self.api.Table.from_pandas(df, **from_pandas_kwargs)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "pyarrow/table.pxi", line 3874, in pyarrow.lib.Table.from_pandas
  File "/Users/steve/steve_python3.11_env/lib/python3.11/site-packages/pyarrow/pandas_compat.py", line 570, in dataframe_to_arrays
    convert_fields) = _get_columns_to_convert(df, schema, preserve_index,
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/steve/steve_python3.11_env/lib/python3.11/site-packages/pyarrow/pandas_compat.py", line 352, in _get_columns_to_convert
    raise ValueError(
ValueError: Duplicate column names found: ['level_final', 'level_final', 'entity_graph_final']
11:46:46,793 graphrag.index.reporting.file_workflow_callbacks INFO Error running pipeline! details=None

Please check. Somehow I always got this error while starting index.

Thanks, Steve

generalmilk commented 4 months ago

I have very similar issue:

22:57:51,802 graphrag.index.run ERROR error running workflow create_base_entity_graph Traceback (most recent call last): File "/home/ec2-user/GraphRAG-Local-UI/graphrag/index/run.py", line 328, in run_pipeline output = await emit_workflow_output(workflow) File "/home/ec2-user/GraphRAG-Local-UI/graphrag/index/run.py", line 287, in emit_workflow_output await emitter.emit(workflow.name, output) File "/home/ec2-user/GraphRAG-Local-UI/graphrag/index/emit/parquet_table_emitter.py", line 40, in emit await self._storage.set(filename, data.to_parquet()) File "/usr/local/lib/python3.10/dist-packages/pandas/util/_decorators.py", line 333, in wrapper return func(*args, kwargs) File "/usr/local/lib/python3.10/dist-packages/pandas/core/frame.py", line 3113, in to_parquet return to_parquet( File "/usr/local/lib/python3.10/dist-packages/pandas/io/parquet.py", line 480, in to_parquet impl.write( File "/usr/local/lib/python3.10/dist-packages/pandas/io/parquet.py", line 190, in write table = self.api.Table.from_pandas(df, from_pandas_kwargs) File "pyarrow/table.pxi", line 3874, in pyarrow.lib.Table.from_pandas File "/usr/local/lib/python3.10/dist-packages/pyarrow/pandas_compat.py", line 570, in dataframe_to_arrays convert_fields) = _get_columns_to_convert(df, schema, preserve_index, File "/usr/local/lib/python3.10/dist-packages/pyarrow/pandas_compat.py", line 352, in _get_columns_to_convert raise ValueError( ValueError: Duplicate column names found: ['entity_graph', 'level', 'level', 'clustered_graph_0'] 22:57:51,805 graphrag.index.reporting.file_workflow_callbacks INFO Error running pipeline! details=None

severian42 commented 4 months ago

Hey! I am currently trying to troubleshoot this. The create_base_entity.py workflow is having issues with wanting to execute using the local LLM flow. Hopefully a fix should be here soon

severian42 commented 4 months ago

Hey! Thanks for your patience on this, the latest update should take care of this issue. Let me know if you're still having problems after the update! Make sure to clear everything out and start fresh if possible for the best chances of it working immediately. Thanks again for checking out the project!