microsoft / graphrag

A modular graph-based Retrieval-Augmented Generation (RAG) system
https://microsoft.github.io/graphrag/
MIT License
13.49k stars 1.16k forks source link

Documentation for Local Search #419

Open zanderjiang opened 3 weeks ago

zanderjiang commented 3 weeks ago

In the documentation, under the local search notebook, https://microsoft.github.io/graphrag/posts/query/notebooks/local_search_nb/

The standard indexing pipeline no longer creates a create_final_covariates.parquet file.

Need to set the covariates section of the local search context builder to None.

mystvearn commented 3 weeks ago

Totally agree. create_final_covariates.parquet is no longer available. The document should have been updated to reflect the change

noworneverev commented 3 weeks ago

In the documentation at https://microsoft.github.io/graphrag/posts/config/env_vars/, you can see that GRAPHRAG_CLAIM_EXTRACTION_ENABLED defaults to False. I believe this is why the create_final_covariates.parquet file is not generated with the default settings.

I'm curious why this setting defaults to False. Is it to avoid too many LLM calls, or is there another reason?

mystvearn commented 3 weeks ago

In the documentation at https://microsoft.github.io/graphrag/posts/config/env_vars/, you can see that GRAPHRAG_CLAIM_EXTRACTION_ENABLED defaults to False. I believe this is why the create_final_covariates.parquet file is not generated with the default settings.

I'm curious why this setting defaults to False. Is it to avoid too many LLM calls, or is there another reason?

I think there are some mistakes in the document. I ran the local search through CLI command and it works perfectly. Inspecting further I found out that you can simply ignore Covariate by setting it to empty string, like in their code at https://github.com/microsoft/graphrag/blob/main/graphrag/query/cli.py

natoverse commented 1 week ago

Covariates are optional because they typically take a lot of domain-specific prompt tuning. (We also call these "claims" since they are claimed statements of fact). If they not enabled in the config, the output parquet is not created, and local search should ignore the fact that it is missing.