microsoft / graphrag

A modular graph-based Retrieval-Augmented Generation (RAG) system
https://microsoft.github.io/graphrag/
MIT License
16.2k stars 1.51k forks source link

[Bug]: wrong key when reading community report configuration from settiing.yaml #546

Closed ronchengang closed 1 month ago

ronchengang commented 1 month ago

Describe the bug

the configuration of community_report was never read because in create_graphrag_config.py, it refers to a wrong key: community_reports

community_report_config = values.get("community_reports") or {}   <-- it should be comminity_report

Steps to reproduce

change some config values for community_report section in settings.ymal, it never tasks effect.

Expected Behavior

No response

GraphRAG Config Used

No response

Logs and screenshots

No response

Additional Information

l544301590 commented 1 month ago

And, the default prompt of community_report is strange. Duplicated TWO long parts of instructions aggravate financial burden of toiling masses.

And, the json part should use single '{' and '}', since they are using str.replace instead of str.format, which may lead to wrong json format generated by some stupid models.

So many bugs!

ngoanpv commented 1 month ago

And, the default prompt of community_report is strange. Duplicated TWO long parts of instructions aggravate financial burden of toiling masses.

And, the json part should use single '{' and '}', since they are using str.replace instead of str.format, which may lead to wrong json format generated by some stupid models.

So many bugs!

The prompt will work with the GPT-4 model as reported in the paper. For some open-source models, it really needs prompt tuning. I am trying to edit it to get the best results with a local LLM model.

ngoanpv commented 1 month ago

community_report

It was fixed at https://github.com/microsoft/graphrag/pull/405 , you should update the latest version of graphrag

ronchengang commented 1 month ago

I can see they have changed the key name from 'community_report' to 'community_reports' in init_content.py, this also works. This bug can be closed.