microsoft / graphrag

A modular graph-based Retrieval-Augmented Generation (RAG) system
https://microsoft.github.io/graphrag/
MIT License
19.35k stars 1.91k forks source link

[Bug]: Embedding pipeline bad column name #1355

Closed nievespg1 closed 2 weeks ago

nievespg1 commented 2 weeks ago

Do you need to file an issue?

Describe the bug

There is a title variable that gets use to access the entity titles within a dataframe before embedding the descriptions for that entity. Later in the code, this variable get updated with a description causing a KeyValue error.

Here are the two lines in question: https://github.com/microsoft/graphrag/blob/634e3ed62a6c5de7084f20e034edbb7185ad5e84/graphrag/index/operations/embed_text/embed_text.py#L180

https://github.com/microsoft/graphrag/blob/634e3ed62a6c5de7084f20e034edbb7185ad5e84/graphrag/index/operations/embed_text/embed_text.py#L196

Steps to reproduce

Build a brand-new index

Expected Behavior

No response

GraphRAG Config Used

# Paste your config here

Logs and screenshots

No response

Additional Information

natoverse commented 2 weeks ago

Resolved with #1356, thanks @nievespg1 !