gusye1234 / nano-graphrag

A simple, easy-to-hack GraphRAG implementation
MIT License
733 stars 68 forks source link

JSON parse failed #39

Open Dorbmon opened 3 days ago

Dorbmon commented 3 days ago

Hi, I tried to insert a large document into the graphrag so I set gpt-4o-mini as the best_model_func:

graph_func = GraphRAG(working_dir="./test", best_model_func=gpt_4o_mini_complete)

.Then, I encountered an error:

INFO:nano-graphrag:Writing graph with 15585 nodes, 6276 edges
Traceback (most recent call last):
  File "/root/test_build.py", line 14, in <module>
    graph_func.insert(content)
  File "/root/.env/lib/python3.11/site-packages/nano_graphrag/graphrag.py", line 181, in insert
    return loop.run_until_complete(self.ainsert(string_or_strings))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/asyncio/base_events.py", line 653, in run_until_complete
    return future.result()
           ^^^^^^^^^^^^^^^
  File "/root/.env/lib/python3.11/site-packages/nano_graphrag/graphrag.py", line 291, in ainsert
    await generate_community_report(
  File "/root/.env/lib/python3.11/site-packages/nano_graphrag/_op.py", line 578, in generate_community_report
    this_level_communities_reports = await asyncio.gather(
                                     ^^^^^^^^^^^^^^^^^^^^^
  File "/root/.env/lib/python3.11/site-packages/nano_graphrag/_op.py", line 555, in _form_single_community_report
    data = use_string_json_convert_func(response)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/.env/lib/python3.11/site-packages/nano_graphrag/_utils.py", line 31, in convert_response_to_json
    assert json_str is not None, f"Unable to parse JSON from response: {response}"
           ^^^^^^^^^^^^^^^^^^^^

Is there any retry option or only gpt-4o can do this kind of task?

Thanks.

gusye1234 commented 3 days ago

Hi, maybe you can give the total error log? It seems like the gpt-4o-mini returns an empty string

Dorbmon commented 3 days ago

The is the error information:

AssertionError: Unable to parse JSON from response: {
    "title": "Binomial Distribution and Related Concepts",
    "summary": "This community centers around the Binomial Distribution, which encapsulates the probabilities of successes in a fixed number of Bernoulli Trials. Key entities include various mathematical concepts and corollaries that are intricately linked to the Binomial Distribution, providing a comprehensive framework for understanding probability theory.",
    "rating": 7.5,
    "rating_explanation": "The impact severity rating is significant due to the foundational nature of the Binomial Distribution in probability and statistics, influencing numerous applications in diverse fields.",
    "findings": [
        {
            "summary": "Centrality of Binomial Distribution",
            "explanation": "The Binomial Distribution serves as a core entity in this community, defining the mathematical framework for studying the number of successes in a fixed number of Bernoulli trials. It is characterized by parameters such as the number of trials (n) and the success probability (p), making it vital for foundational statistics. This distribution aids in predicting outcomes in scenarios where the results are binary, such as success/failure or win/lose situations.",
        "In practical terms, the significance of the Binomial Distribution extends across various fields, including finance, science, and social research. It lays the groundwork for advanced statistical theories, making its understanding crucial for anyone delving into data analysis or inferential statistics."
gusye1234 commented 3 days ago

Seem like it reached the max new token

Dorbmon commented 3 days ago

Is there any way to solve this? If we use gpt-4o, the cost will be too much for us.