Open ksachdeva opened 3 months ago
Nice find and thorough reporting @ksachdeva. @AlonsoGuevara can you confirm and update the df/prompt? I'm not aware of any direct bugs in search results due to this, so I suspect the LLM is smart enough to recognize the intent of the column name, but it would be sensible to be as precise as possible.
Do you need to file an issue?
Describe the bug
Above is an image showing the portion of
community_report
extraction prompt.The context data that is generated ends up having headers that are a bit different
The differences are
For Entities - generate context has
human_readable_id
instead ofid
andtitle
instead ofentities
For Relationships - generate context hashuman_readable_id
instead ofid
The bug can be fixed by modifying the code or making the headers in the prompt compliant with the ones generated by the code.
For code, you would need modifications in these verbs.
This is happening because of https://github.com/microsoft/graphrag/blob/c749fe2a151b9e8259bf4fef2f6c45cf82f1181e/graphrag/index/verbs/graph/report/prepare_community_reports_nodes.py#L37
and
https://github.com/microsoft/graphrag/blob/c749fe2a151b9e8259bf4fef2f6c45cf82f1181e/graphrag/index/verbs/graph/report/prepare_community_reports_edges.py#L38
Steps to reproduce
You can look into the cache files. I am attaching one here.
example_community_report_cache.txt
Expected Behavior
No response
GraphRAG Config Used
Logs and screenshots
No response
Additional Information