microsoft / graphrag

A modular graph-based Retrieval-Augmented Generation (RAG) system
https://microsoft.github.io/graphrag/
MIT License
18.13k stars 1.76k forks source link

[Issue]: There are some errors in create_final_community_reports #452

Closed xxWeiDG closed 3 months ago

xxWeiDG commented 3 months ago

Describe the issue

image

Steps to reproduce

No response

GraphRAG Config Used

No response

Logs and screenshots

Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/graphrag/llm/openai/utils.py", line 93, in try_parse_json_object
    result = json.loads(input)
             ^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/json/__init__.py", line 346, in loads
    return _default_decoder.decode(s)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/json/decoder.py", line 355, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 2 column 1 (char 1)
03:00:02,602 graphrag.index.graph.extractors.community_reports.community_reports_extractor ERROR error generating community report
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/graphrag/index/graph/extractors/community_reports/community_reports_extractor.py", line 58, in __call__
    await self._llm(
  File "/usr/local/lib/python3.11/site-packages/graphrag/llm/openai/json_parsing_llm.py", line 34, in __call__
    result = await self._delegate(input, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/graphrag/llm/openai/openai_token_replacing_llm.py", line 37, in __call__
    return await self._delegate(input, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/graphrag/llm/openai/openai_history_tracking_llm.py", line 33, in __call__
    output = await self._delegate(input, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/graphrag/llm/base/caching_llm.py", line 104, in __call__
    result = await self._delegate(input, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/graphrag/llm/base/rate_limiting_llm.py", line 177, in __call__
    result, start = await execute_with_retry()
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/graphrag/llm/base/rate_limiting_llm.py", line 159, in execute_with_retry
    async for attempt in retryer:
  File "/usr/local/lib/python3.11/site-packages/tenacity/asyncio/__init__.py", line 166, in __anext__
    do = await self.iter(retry_state=self._retry_state)
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/tenacity/asyncio/__init__.py", line 153, in iter
    result = await action(retry_state)
             ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/tenacity/_utils.py", line 99, in inner
    return call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/tenacity/__init__.py", line 398, in <lambda>
    self._add_action_func(lambda rs: rs.outcome.result())
                                     ^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/concurrent/futures/_base.py", line 449, in result
    return self.__get_result()
           ^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/concurrent/futures/_base.py", line 401, in __get_result
    raise self._exception
  File "/usr/local/lib/python3.11/site-packages/graphrag/llm/base/rate_limiting_llm.py", line 165, in execute_with_retry
    return await do_attempt(), start
           ^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/graphrag/llm/base/rate_limiting_llm.py", line 147, in do_attempt
    return await self._delegate(input, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/graphrag/llm/base/base_llm.py", line 48, in __call__
    return await self._invoke_json(input, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/graphrag/llm/openai/openai_chat_llm.py", line 82, in _invoke_json
    result = await generate()
             ^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/graphrag/llm/openai/openai_chat_llm.py", line 74, in generate
    await self._native_json(input, **{**kwargs, "name": call_name})
  File "/usr/local/lib/python3.11/site-packages/graphrag/llm/openai/openai_chat_llm.py", line 108, in _native_json
    json_output = try_parse_json_object(raw_output)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/graphrag/llm/openai/utils.py", line 93, in try_parse_json_object
    result = json.loads(input)
             ^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/json/__init__.py", line 346, in loads
    return _default_decoder.decode(s)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/json/decoder.py", line 355, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 2 column 1 (char 1)
03:00:02,606 graphrag.index.reporting.file_workflow_callbacks INFO Community Report Extraction Error details=None
03:00:02,606 graphrag.index.verbs.graph.report.strategies.graph_intelligence.run_graph_intelligence WARNING No report found for community: 1
03:00:03,494 httpx INFO HTTP Request: POST http://127.0.0.1:9998/v1/chat/completions "HTTP/1.1 200 OK"
03:00:03,495 graphrag.llm.openai.utils ERROR error loading json, json=
```json
{
    "title": "Jay Chou's Community Impact",
    "summary": "This community is centered around the Taiwanese singer and musician Jay Chou, who has a significant influence in the entertainment and technology sectors. The community includes various entities related to Chou's career, philanthropy, and business ventures, highlighting his multifaceted impact on society.",
    "rating": 7.5,
    "rating_explanation": "The impact severity rating is high due to Jay Chou's widespread influence in entertainment, technology, and philanthropy, which can affect public opinion, industry trends, and social welfare.",
    "findings": [
        {
            "summary": "Jay Chou's musical achievements and recognition",
            "explanation": "Jay Chou has achieved significant success in the music industry, winning numerous awards, including multiple Golden Melody Awards. His influence extends beyond Taiwan, as he has been recognized by international publications like Fast Company. This highlights his status as a leading figure in the entertainment industry [Data: Entities (25, 26, 27, 28, 29); Relationships (0, 1, 2, 3, 4, 5, 6, +more)]."
        },
        {
            "summary": "Jay Chou's role as a designer and entrepreneur",
            "explanation": "Jay Chou has ventured into the technology sector by designing the 'Zhonghua Shuibi Dian' (Chinese Brush Electricity) notebook, showcasing his involvement in innovative technology. Additionally, he founded JVR Limited, a company that has expanded into various industries, including fashion and entertainment. This demonstrates his entrepreneurial spirit and ability to influence different sectors [Data: Entities (26, 28); Relationships (2, 4)]."
        },
        {
            "summary": "Jay Chou's philanthropic efforts",
            "explanation": "Jay Chou has been actively involved in philanthropy, notably through the construction of Hope Primary Schools. These efforts highlight his commitment to social welfare and education, contributing to the betterment of communities in need [Data: Entities (29); Relationships (5)]."
        },
        {
            "summary": "Jay Chou's influence on the media and cultural landscape",
            "explanation": "Jay Chou's involvement with prominent media conglomerates like the Xinwen Zhongxin Cultural Communication Group suggests his influence on the media and cultural landscape. His relationships with these entities could potentially shape public perception and cultural trends [Data: Entities (27); Relationships (3)]."
        },
        {
            "summary": "Jay Chou's role as a public figure and ambassador",
            "explanation": "Jay Chou has held the title of 'China Anti-Drug Publicity Ambassador,' reflecting his role as a public figure and his commitment to social causes. This position highlights his influence on public policy and social issues [Data: Entities (30); Relationships (6)]."
        }
    ]
}

Traceback (most recent call last): File "/usr/local/lib/python3.11/site-packages/graphrag/llm/openai/utils.py", line 93, in try_parse_json_object result = json.loads(input) ^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/json/init.py", line 346, in loads return _default_decoder.decode(s) ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/json/decoder.py", line 337, in decode obj, end = self.raw_decode(s, idx=_w(s, 0).end()) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/json/decoder.py", line 355, in raw_decode raise JSONDecodeError("Expecting value", s, err.value) from None json.decoder.JSONDecodeError: Expecting value: line 2 column 1 (char 1) 03:00:03,496 graphrag.index.graph.extractors.community_reports.community_reports_extractor ERROR error generating community report Traceback (most recent call last): File "/usr/local/lib/python3.11/site-packages/graphrag/index/graph/extractors/community_reports/community_reports_extractor.py", line 58, in call await self._llm( File "/usr/local/lib/python3.11/site-packages/graphrag/llm/openai/json_parsing_llm.py", line 34, in call result = await self._delegate(input, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/graphrag/llm/openai/openai_token_replacing_llm.py", line 37, in call return await self._delegate(input, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/graphrag/llm/openai/openai_history_tracking_llm.py", line 33, in call output = await self._delegate(input, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/graphrag/llm/base/caching_llm.py", line 104, in call result = await self._delegate(input, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/graphrag/llm/base/rate_limiting_llm.py", line 177, in call result, start = await execute_with_retry() ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/graphrag/llm/base/rate_limiting_llm.py", line 159, in execute_with_retry async for attempt in retryer: File "/usr/local/lib/python3.11/site-packages/tenacity/asyncio/init.py", line 166, in anext do = await self.iter(retry_state=self._retry_state) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/tenacity/asyncio/init.py", line 153, in iter result = await action(retry_state) ^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/tenacity/_utils.py", line 99, in inner return call(*args, kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/tenacity/init.py", line 398, in self._add_action_func(lambda rs: rs.outcome.result()) ^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/concurrent/futures/_base.py", line 449, in result return self.get_result() ^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/concurrent/futures/_base.py", line 401, in get_result raise self._exception File "/usr/local/lib/python3.11/site-packages/graphrag/llm/base/rate_limiting_llm.py", line 165, in execute_with_retry return await do_attempt(), start ^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/graphrag/llm/base/rate_limiting_llm.py", line 147, in do_attempt return await self._delegate(input, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/graphrag/llm/base/base_llm.py", line 48, in call return await self._invoke_json(input, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/graphrag/llm/openai/openai_chat_llm.py", line 82, in _invoke_json result = await generate() ^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/graphrag/llm/openai/openai_chat_llm.py", line 74, in generate await self._native_json(input, {kwargs, "name": call_name}) File "/usr/local/lib/python3.11/site-packages/graphrag/llm/openai/openai_chat_llm.py", line 108, in _native_json json_output = try_parse_json_object(raw_output) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/graphrag/llm/openai/utils.py", line 93, in try_parse_json_object result = json.loads(input) ^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/json/init.py", line 346, in loads return _default_decoder.decode(s) ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/json/decoder.py", line 337, in decode obj, end = self.raw_decode(s, idx=_w(s, 0).end()) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/json/decoder.py", line 355, in raw_decode raise JSONDecodeError("Expecting value", s, err.value) from None json.decoder.JSONDecodeError: Expecting value: line 2 column 1 (char 1) 03:00:03,498 graphrag.index.reporting.file_workflow_callbacks INFO Community Report Extraction Error details=None 03:00:03,498 graphrag.index.verbs.graph.report.strategies.graph_intelligence.run_graph_intelligence WARNING No report found for community: 0 03:00:03,559 datashaper.workflow.workflow INFO executing verb window 03:00:03,559 datashaper.workflow.workflow ERROR Error executing verb "window" in create_final_community_reports: 'community' Traceback (most recent call last): File "/usr/local/lib/python3.11/site-packages/datashaper/workflow/workflow.py", line 410, in _execute_verb result = node.verb.func(verb_args) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/datashaper/engine/verbs/window.py", line 73, in window window = __window_function_mapwindow_operation


  File "/usr/local/lib/python3.11/site-packages/pandas/core/frame.py", line 4102, in __getitem__
    indexer = self.columns.get_loc(key)
              ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/pandas/core/indexes/range.py", line 417, in get_loc
    raise KeyError(key)
KeyError: 'community'
03:00:03,564 graphrag.index.reporting.file_workflow_callbacks INFO Error executing verb "window" in create_final_community_reports: 'community' details=None
03:00:03,564 graphrag.index.run ERROR error running workflow create_final_community_reports
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/graphrag/index/run.py", line 323, in run_pipeline
    result = await workflow.run(context, callbacks)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/datashaper/workflow/workflow.py", line 369, in run
    timing = await self._execute_verb(node, context, callbacks)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/datashaper/workflow/workflow.py", line 410, in _execute_verb
    result = node.verb.func(**verb_args)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/datashaper/engine/verbs/window.py", line 73, in window
    window = __window_function_map[window_operation](input_table[column])
                                                     ~~~~~~~~~~~^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/pandas/core/frame.py", line 4102, in __getitem__
    indexer = self.columns.get_loc(key)
              ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/pandas/core/indexes/range.py", line 417, in get_loc
    raise KeyError(key)
KeyError: 'community'
03:00:03,565 graphrag.index.reporting.file_workflow_callbacks INFO Error running pipeline! details=None

### Additional Information

- GraphRAG Version:
- Operating System:
- Python Version:
- Related Issues:
raihan0824 commented 3 months ago

got the same issue

fryfry33 commented 3 months ago

same issue

fryfry33 commented 3 months ago

Find out that you should put this fline in the settings.yaml

model_supports_json: false # recommended if this is available for your model.

raihan0824 commented 3 months ago

Find out that you should put this fline in the settings.yaml

model_supports_json: false # recommended if this is available for your model.

still got the same issue even after deleting the cache folder

goodmaney commented 3 months ago

Find out that you should put this fline in the settings.yaml

model_supports_json: false # recommended if this is available for your model.

It's work for me.LLM is GLM4, Embedding is bce-embedding-base_v1

ladycui commented 3 months ago

I got the same issue and after changing to a stronger LLM, in my case Qwen2-72B-Instruct, the issue was disappeared(unfortunately glm-4 didn't work, but I guess if executing multi times, it might succeed).

checking the log, we can find out the reason is that LLM returns a markdown format json(starting with ```json) instead of a pure string. I added a log below and here is what a successful input looked like.

22:14:22,462 graphrag.llm.openai.utils INFO ####input: {
    "title": "Scrooge and Marley's Business Community",
    "summary": "The community is centered around Scrooge, a prominent figure in Charles Dickens' 'A Christmas Carol,' who is characterized by his cold, unsympathetic nature and tight-fisted business practices. Scrooge is associated with the business Scrooge and Marley, where he was a partner with Marley, and is linked to various entities such as beggars, blind men's dogs, and children, all of whom avoid him, indicating his negative reputation. The community also includes Marley's funeral, Christmas Eve, and the city where Scrooge's counting-house is located, all of which play significant roles in the narrative.",
    "rating": 7.0,
...
patrickhwood commented 3 months ago

Just pulled the latest version (a0caadb320c5db4e7b8e83625f00c19be893170b) and am stilling seeing the KeyError: 'community' error during the create_final_community_reports phase. Using ollama with gemma2:27b-instruct-q4_0. This error occurs prior to making any embeddings requests. model_supports_json is set false.

xpdd123 commented 3 months ago

Find out that you should put this fline in the settings.yaml model_supports_json: false # recommended if this is available for your model.

It's work for me.LLM is GLM4, Embedding is bce-embedding-base_v1

Can you share how you work it? I use GLM-4 but failed

Misaka-sister commented 3 months ago

Same issue with qwen2-1.5b-instruct-q6_K (ollama) + model_supports_json set to false.