[x] I have searched the existing issues and this bug is not already filed.
[x] My model is hosted on OpenAI or Azure. If not, please look at the "model providers" issue and don't file a new one here.
[ ] I believe this is a legitimate bug, not just a question. If this is a question, please use the Discussions area.
Describe the bug
When following the Global Search notebook (link), I am getting a KeyError raised from the _read_indexercommunities method.
KeyError: "Column(s) ['sub_community'] do not exist"
I also noticed that _read_indexercommunities method was not present when I installed graphrag via pip. When following the notebook, I copied the method manually into the file and re-ran indexing (deleted the cache and output dirs) before re-indexing.
At line 227
Steps to reproduce
No response
Expected Behavior
No response
GraphRAG Config Used
encoding_model: cl100k_base
skip_workflows: []
llm:
api_key: ${GRAPHRAG_API_KEY}
type: azure_openai_chat
model_supports_json: true # recommended if this is available for your model.
max_tokens: 4096
# request_timeout: 180.0
api_base: <REDACTED>
api_version: 2024-02-15-preview
deployment_name: <REDACTED>
temperature: 0 # temperature for sampling
top_p: 0.999 # top-p sampling
n: 1 # Number of completions to generate
parallelization:
stagger: 0.3
# num_threads: 50 # the number of threads to use for parallel processing
async_mode: threaded # or asyncio
embeddings:
## parallelization: override the global parallelization settings for embeddings
async_mode: threaded # or asyncio
# target: required # or all
# batch_size: 16 # the number of documents to send in a single request
# batch_max_tokens: 8191 # the maximum number of tokens to send in a single request
vector_store:
type: lancedb
db_uri: 'output/lancedb'
container_name: default # A prefix for the vector store to create embedding containers. Default: 'default'.
overwrite: true
llm:
api_key: ${GRAPHRAG_API_KEY}
type: azure_openai_embedding
api_base: <REDACTED>
api_version: 2024-02-15-preview
deployment_name: <REDACTED>
chunks:
size: 1200
overlap: 100
group_by_columns: [id] # by default, we don't allow chunks to cross documents
input:
type: file # or blob
file_type: text # or csv
base_dir: "input"
file_encoding: utf-8
file_pattern: ".*\\.txt$"
cache:
type: file # or blob
base_dir: "cache"
storage:
type: file # or blob
base_dir: "output"
reporting:
type: file # or console, blob
base_dir: "logs"
entity_extraction:
prompt: "prompts/entity_extraction.txt"
entity_types: [organization,person,geo,event]
max_gleanings: 1
summarize_descriptions:
prompt: "prompts/summarize_descriptions.txt"
max_length: 500
claim_extraction:
prompt: "prompts/claim_extraction.txt"
description: "Any claims or facts that could be relevant to information discovery."
max_gleanings: 1
community_reports:
prompt: "prompts/community_report.txt"
max_length: 2000
max_input_length: 8000
cluster_graph:
max_cluster_size: 10
umap:
enabled: false # if true, will generate UMAP embeddings for nodes
snapshots:
graphml: false
raw_entities: false
top_level_nodes: false
Do you need to file an issue?
Describe the bug
When following the Global Search notebook (link), I am getting a KeyError raised from the _read_indexercommunities method.
KeyError: "Column(s) ['sub_community'] do not exist"
I also noticed that _read_indexercommunities method was not present when I installed graphrag via pip. When following the notebook, I copied the method manually into the file and re-ran indexing (deleted the cache and output dirs) before re-indexing.
At line 227
Steps to reproduce
No response
Expected Behavior
No response
GraphRAG Config Used
Logs and screenshots
No response
Additional Information