microsoft / graphrag

A modular graph-based Retrieval-Augmented Generation (RAG) system
https://microsoft.github.io/graphrag/
MIT License
19.44k stars 1.92k forks source link

[Bug]: Mismatch between header in community report generation prompt examples and input data (id vs human_readable_id) #860

Open ksachdeva opened 3 months ago

ksachdeva commented 3 months ago

Do you need to file an issue?

Describe the bug

image

Above is an image showing the portion of community_report extraction prompt.

Entities

id,entity,description

Relationships

id,source,target,description

The context data that is generated ends up having headers that are a bit different

Entities
human_readable_id,title,description

Relationships

human_readable_id,source,target,description

The differences are

For Entities - generate context has human_readable_id instead of id and title instead of entities For Relationships - generate context has human_readable_id instead of id

The bug can be fixed by modifying the code or making the headers in the prompt compliant with the ones generated by the code.

For code, you would need modifications in these verbs.

This is happening because of https://github.com/microsoft/graphrag/blob/c749fe2a151b9e8259bf4fef2f6c45cf82f1181e/graphrag/index/verbs/graph/report/prepare_community_reports_nodes.py#L37

and

https://github.com/microsoft/graphrag/blob/c749fe2a151b9e8259bf4fef2f6c45cf82f1181e/graphrag/index/verbs/graph/report/prepare_community_reports_edges.py#L38

Steps to reproduce

You can look into the cache files. I am attaching one here.

example_community_report_cache.txt

Expected Behavior

No response

GraphRAG Config Used

# Paste your config here

Logs and screenshots

No response

Additional Information

natoverse commented 3 months ago

Nice find and thorough reporting @ksachdeva. @AlonsoGuevara can you confirm and update the df/prompt? I'm not aware of any direct bugs in search results due to this, so I suspect the LLM is smart enough to recognize the intent of the column name, but it would be sensible to be as precise as possible.