microsoft / graphrag

A modular graph-based Retrieval-Augmented Generation (RAG) system
https://microsoft.github.io/graphrag/
MIT License
20.06k stars 1.96k forks source link

[Issue]: ValueError when specifying custom timestamp_column in CSV to YAML conversion #578

Closed silenceliang closed 4 months ago

silenceliang commented 4 months ago

Describe the issue

When attempting to convert CSV data into a YAML format, specifying a custom column for the timestamp results in a ValueError. The exception is raised within the pandas library, specifically at the following location:

.pyenv/versions/graphrag/lib/python3.10/site-packages/pandas/core/reshape/concat.py, on line 507, with the error message “No objects to concatenate”. 

This issue occurs during the data input process where the CSV data is expected to be formatted according to the settings in a YAML file.

Steps to reproduce

  1. Set setting.yaml input: type: file # or blob file_type: csv # or csv base_dir: "input" file_encoding: utf-8 file_pattern: ".*\.csv" timestamp_column: "event_time"
  2. run python -m graphrag.index --root ./myFolder
  3. exception raised

GraphRAG Config Used

input: type: file # or blob file_type: csv # or csv base_dir: "input" file_encoding: utf-8 file_pattern: ".*\.csv" text_column: "description" timestamp_column: "event_time"

Logs and screenshots

python -m graphrag.index --root ./ragtest_event_csv
🚀 Reading settings from ragtest_event_csv/settings.yaml
Traceback (most recent call last):
  File
"/Users/brian_liang/.pyenv/versions/3.10.4/lib/python3.10/runpy.py", line
196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File
"/Users/brian_liang/.pyenv/versions/3.10.4/lib/python3.10/runpy.py", line
86, in _run_code
    exec(code, run_globals)
  File "/Users/brian_liang/graphrag/graphrag/index/__main__.py", line 76,
in <module>
    index_cli(
  File "/Users/brian_liang/graphrag/graphrag/index/cli.py", line 161, in
index_cli
    _run_workflow_async()
  File "/Users/brian_liang/graphrag/graphrag/index/cli.py", line 159, in
_run_workflow_async
    asyncio.run(execute())
  File
"/Users/brian_liang/.pyenv/versions/3.10.4/lib/python3.10/asyncio/runners
.py", line 44, in run
    return loop.run_until_complete(main)
  File "uvloop/loop.pyx", line 1517, in
uvloop.loop.Loop.run_until_complete
  File "/Users/brian_liang/graphrag/graphrag/index/cli.py", line 123, in
execute
    async for output in run_pipeline_with_config(
  File "/Users/brian_liang/graphrag/graphrag/index/run.py", line 144, in
run_pipeline_with_config
    dataset = dataset if dataset is not None else await
_create_input(config.input)
  File "/Users/brian_liang/graphrag/graphrag/index/run.py", line 133, in
_create_input
    return await load_input(config, progress_reporter, root_dir)
  File "/Users/brian_liang/graphrag/graphrag/index/input/load_input.py",
line 81, in load_input
    results = await loader(config, progress, storage)
  File "/Users/brian_liang/graphrag/graphrag/index/input/csv.py", line
135, in load
    result = pd.concat(files_loaded)
  File
"/Users/brian_liang/.pyenv/versions/graphrag/lib/python3.10/site-packages
/pandas/core/reshape/concat.py", line 382, in concat
    op = _Concatenator(
  File
"/Users/brian_liang/.pyenv/versions/graphrag/lib/python3.10/site-packages
/pandas/core/reshape/concat.py", line 445, in __init__
    objs, keys = self._clean_keys_and_objs(objs, keys)
  File
"/Users/brian_liang/.pyenv/versions/graphrag/lib/python3.10/site-packages
/pandas/core/reshape/concat.py", line 507, in _clean_keys_and_objs
    raise ValueError("No objects to concatenate")
ValueError: No objects to concatenate
⠋ GraphRAG Indexer
└──Loading Input (csv) - 1 files loaded (0 filtered) ━━━━ 100% 0:0… 0:0…

image

Additional Information

Jialunula commented 4 months ago

Is this problem solved? I have the same problem.

chongchongaikubao commented 4 months ago

Is this problem solved? I have the same problem.

Have you solved this problem?

BlackBerryHub commented 4 months ago

Is this problem solved? I have the same problem.

Have you solved this problem?

Hi, try to set the timestamp_format parameter

silenceliang commented 4 months ago

Is this problem solved? I have the same problem.

Have you solved this problem?

Hi, try to set the timestamp_format parameter

Yes, it got solved. Thanks Berry.

@Jialunula @chongchongaikubao It necessary define the timestamp_format in setting.yaml You can refer to graphrag/index/input/csv.py in the source code!