Closed tonypius closed 1 week ago
I can't even get a single CSV to work
08:55:25,872 datashaper.workflow.workflow ERROR Error executing verb "zip" in create_base_text_units: 'text' Traceback (most recent call last): File "/home/fragb0x/GRAPH/lib/python3.10/site-packages/pandas/core/indexes/base.py", line 3805, in get_loc return self._engine.get_loc(casted_key) File "index.pyx", line 167, in pandas._libs.index.IndexEngine.get_loc File "index.pyx", line 196, in pandas._libs.index.IndexEngine.get_loc File "pandas/_libs/hashtable_class_helper.pxi", line 7081, in pandas._libs.hashtable.PyObjectHashTable.get_item File "pandas/_libs/hashtable_class_helper.pxi", line 7089, in pandas._libs.hashtable.PyObjectHashTable.get_item KeyError: 'text'
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/fragb0x/GRAPH/lib/python3.10/site-packages/datashaper/workflow/workflow.py", line 410, in _execute_verb
result = node.verb.func(*verb_args)
File "/home/fragb0x/GRAPH/lib/python3.10/site-packages/graphrag/index/verbs/zip.py", line 29, in zip_verb
table[to] = list(zip([table[col] for col in columns], strict=True))
File "/home/fragb0x/GRAPH/lib/python3.10/site-packages/graphrag/index/verbs/zip.py", line 29, in
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/fragb0x/GRAPH/lib/python3.10/site-packages/graphrag/index/run.py", line 323, in run_pipeline
result = await workflow.run(context, callbacks)
File "/home/fragb0x/GRAPH/lib/python3.10/site-packages/datashaper/workflow/workflow.py", line 369, in run
timing = await self._execute_verb(node, context, callbacks)
File "/home/fragb0x/GRAPH/lib/python3.10/site-packages/datashaper/workflow/workflow.py", line 410, in _execute_verb
result = node.verb.func(*verb_args)
File "/home/fragb0x/GRAPH/lib/python3.10/site-packages/graphrag/index/verbs/zip.py", line 29, in zip_verb
table[to] = list(zip([table[col] for col in columns], strict=True))
File "/home/fragb0x/GRAPH/lib/python3.10/site-packages/graphrag/index/verbs/zip.py", line 29, in
I have solved the problem. If you modify settings.yaml well. your csv file must have a column named "text".
self._reader = parsers.TextReader(src, **kwds)
File "parsers.pyx", line 574, in pandas._libs.parsers.TextReader.cinit File "parsers.pyx", line 663, in pandas._libs.parsers.TextReader._get_header File "parsers.pyx", line 874, in pandas._libs.parsers.TextReader._tokenize_rows File "parsers.pyx", line 891, in pandas._libs.parsers.TextReader._check_tokenize_status File "parsers.pyx", line 2053, in pandas._libs.parsers.raise_parser_error UnicodeDecodeError: 'utf-8' codec can't decode byte 0x92 in position 3355: invalid start byte Sentry is attempting to send 2 pending events Waiting up to 2 seconds
please help, csv fils as inpt not working, changed settings.yaml still throwing err or
This issue has been marked stale due to inactivity after repo maintainer or community member responses that request more information or suggest a solution. It will be closed after five additional days.
This issue has been marked stale due to inactivity after repo maintainer or community member responses that request more information or suggest a solution. It will be closed after five additional days.
This issue has been closed after being marked as stale for five days. Please reopen if needed.
Describe the bug
While loading csv from the input folder, the indexing step fails with error "Error executing verb "zip" in create_base_text_units: 'text' "
Steps to reproduce
I have graph rag setup with azure openai and i successfully ran it on a txt file. But when i tried to load 11 csv files, you can see in the logs below it loads the files properly and fails when the pipeline starts.
Logs and screenshots
11:32:54,901 graphrag.index.input.csv INFO loading 11 csv files 11:32:54,903 graphrag.index.input.csv INFO Total number of unfiltered csv rows: 13469 11:32:54,905 graphrag.index.workflows.load INFO Workflow Run Order: ['create_base_text_units', 'create_base_extracted_entities', 'create_summarized_entities', 'create_base_entity_graph', 'create_final_entities', 'create_final_nodes', 'create_final_communities', 'join_text_units_to_entity_ids', 'create_final_relationships', 'join_text_units_to_relationship_ids', 'create_final_community_reports', 'create_final_text_units', 'create_base_documents', 'create_final_documents'] 11:32:54,905 graphrag.index.run INFO Final # of rows loaded: 13469 11:32:55,105 graphrag.index.run INFO Running workflow: create_base_text_units... 11:32:55,105 graphrag.index.run INFO dependencies for create_base_text_units: [] 11:32:55,111 datashaper.workflow.workflow INFO executing verb orderby 11:32:55,128 datashaper.workflow.workflow INFO executing verb zip 11:32:55,128 datashaper.workflow.workflow ERROR Error executing verb "zip" in create_base_text_units: 'text'