microsoft / graphrag

A modular graph-based Retrieval-Augmented Generation (RAG) system
https://microsoft.github.io/graphrag/
MIT License
19.34k stars 1.91k forks source link

Fix non-default emitters #1403

Closed AlonsoGuevara closed 1 week ago

AlonsoGuevara commented 1 week ago

Description

Fix non-default emitters (Piepline assumed parquet always)

Related Issues

[Reference any related issues or tasks that this pull request addresses.]

Proposed Changes

[List the specific changes made in this pull request.]

Checklist

Additional Notes

[Add any additional notes or context that may be helpful for the reviewer(s).]

jgbradley1 commented 1 week ago

I tested and confirmed this addresses the bug on the indexing side. It breaks on the query side because the code here assumes a parquet format is being loaded. I'm not sure if the query issue needs to be addressed or not though to support the cosmosdb support.

AlonsoGuevara commented 1 week ago

I tested and confirmed this addresses the bug on the indexing side. It breaks on the query side because the code here assumes a parquet format is being loaded. I'm not sure if the query issue needs to be addressed or not though to support the cosmosdb support.

Ah! Thanks for testing query. I'll address that on this same PR, great catch!!

AlonsoGuevara commented 1 week ago

Closing this PR, as we discussed, this opened the possibility of more issues than fixes, so, .parquet will now be always included as output and emitters like json will be additional.