microsoft / graphrag

A modular graph-based Retrieval-Augmented Generation (RAG) system
https://microsoft.github.io/graphrag/
MIT License
20.07k stars 1.96k forks source link

[Issue]: <title> Examples Data File Missing #989

Closed xinzheng99 closed 3 months ago

xinzheng99 commented 3 months ago

Do you need to file an issue?

Describe the issue

When I tried to learn the examples under the examples directory, I found that I didn't have the data I needed in the file. Path: examples/entity_extraction/with_graph_intelligence/run.py Need: `sample_data_dir = os.path.join( os.path.dirname(os.path.abspath(file)), "../../_sample_data/" )

shared_dataset = asyncio.run( load_input( PipelineCSVInputConfig( file_pattern=".*\.csv$", base_dir=sample_data_dir, source_column="author", text_column="message", timestamp_column="date(yyyyMMddHHmmss)", timestamp_format="%Y%m%d%H%M%S", title_column="message", ), ) )`

"../../_sample_data/" does not exist, and I

Steps to reproduce

Jus the file doesn't exist.

GraphRAG Config Used

# Paste your config here

Logs and screenshots

No response

Additional Information

xinzheng99 commented 3 months ago

It came up when I randomly constructed the DATA: dataset = pd.DataFrame([ {"author": "Eva", "message": "I'm fine, thanks!", "date(yyyyMMddHHmmss)": 20240821204422}, {"author": "Eva", "message": "I'm fine, thanks!", "date(yyyyMMddHHmmss)": 20240821204422}, {"author": "Eva", "message": "I'm fine, thanks!", "date(yyyyMMddHHmmss)": 20240821204422}, {"author": "Eva", "message": "I'm fine, thanks!", "date(yyyyMMddHHmmss)": 20240821204422}, ]) Traceback (most recent call last): File "/Users/maxz/graphrag-source/examples/entity_extraction/with_nltk/run.py", line 93, in asyncio.run(run_python()) File "/opt/anaconda3/envs/graphrag-raw/lib/python3.11/asyncio/runners.py", line 190, in run return runner.run(main) ^^^^^^^^^^^^^^^^ File "/opt/anaconda3/envs/graphrag-raw/lib/python3.11/asyncio/runners.py", line 118, in run return self._loop.run_until_complete(task) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/anaconda3/envs/graphrag-raw/lib/python3.11/asyncio/base_events.py", line 654, in run_until_complete return future.result() ^^^^^^^^^^^^^^^ File "/Users/maxz/graphrag-source/examples/entity_extraction/with_nltk/run.py", line 81, in run_python async for table in run_pipeline(dataset=dataset, workflows=workflows): File "/opt/anaconda3/envs/graphrag-raw/lib/python3.11/site-packages/graphrag/index/run.py", line 220, in run_pipeline loaded_workflows = load_workflows( ^^^^^^^^^^^^^^^ File "/opt/anaconda3/envs/graphrag-raw/lib/python3.11/site-packages/graphrag/index/workflows/load.py", line 75, in load_workflows workflow = create_workflow( ^^^^^^^^^^^^^^^^ File "/opt/anaconda3/envs/graphrag-raw/lib/python3.11/site-packages/graphrag/index/workflows/load.py", line 134, in create_workflow steps = steps or _get_steps_for_workflow(name, config, additional_workflows) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/anaconda3/envs/graphrag-raw/lib/python3.11/site-packages/graphrag/index/workflows/load.py", line 163, in _get_steps_for_workflow raise UnknownWorkflowError(name) graphrag.index.errors.UnknownWorkflowError: Unknown workflow: entity_extraction

natoverse commented 3 months ago

Marking as duplicate of https://github.com/microsoft/graphrag/issues/353