microsoft / graphrag

A modular graph-based Retrieval-Augmented Generation (RAG) system
https://microsoft.github.io/graphrag/
MIT License
16.83k stars 1.58k forks source link

Support content in languages other than English #696

Open natoverse opened 1 month ago

natoverse commented 1 month ago

GraphRAG does not explicitly support any particular language, however, the prompts are written in English and most of our evaluation has been done using English-language datasets. Many users would like to use GraphRAG for non-English datasets, and have reported varying levels of success. GraphRAG performance may vary across languages based on prompting, encoding/tokenizing, and the training and biases of the chosen model.

While we don't plan to implement explicit features or support for any language in particular at this time, there are a number of things users can do to try and improve non-English language support. A few examples:

natoverse commented 1 month ago

Some helpful content for users with Chinese content could be here: https://github.com/microsoft/graphrag/issues/596