microsoft / graphrag

A modular graph-based Retrieval-Augmented Generation (RAG) system
https://microsoft.github.io/graphrag/
MIT License
16.77k stars 1.57k forks source link

[Bug]: <Error executing verb \"cluster_graph\" in create_base_entity_graph: EmptyNetworkError> #618

Closed Bai1026 closed 1 week ago

Bai1026 commented 1 month ago

Describe the bug

Error executing verb \"cluster_graph\" in create_base_entity_graph: EmptyNetworkError With my own dataset, and openai API key. But do have the extracted entities in the entity_extraction folder and summarize_descriptions folders.

Steps to reproduce

No response

Expected Behavior

No response

GraphRAG Config Used

encoding_model: cl100k_base skip_workflows: [] llm: api_key: ${GRAPHRAG_API_KEY} type: openai_chat # or azure_openai_chat

model: gpt-4-turbo-preview

model: gpt-3.5-turbo-1106

model: gpt-4o-2024-05-13

model_supports_json: true # recommended if this is available for your model.

max_tokens: 4000

request_timeout: 180.0

api_base: https://.openai.azure.com

api_version: 2024-02-15-preview

organization:

deployment_name:

tokens_per_minute: 150_000 # set a leaky bucket throttle

requests_per_minute: 10_000 # set a leaky bucket throttle

max_retries: 10

max_retry_wait: 10.0

sleep_on_rate_limit_recommendation: true # whether to sleep when azure suggests wait-times

concurrent_requests: 25 # the number of parallel inflight requests that may be made

parallelization: stagger: 0.3

num_threads: 50 # the number of threads to use for parallel processing

async_mode: threaded # or asyncio

embeddings:

parallelization: override the global parallelization settings for embeddings

async_mode: threaded # or asyncio llm: api_key: ${GRAPHRAG_API_KEY} type: openai_embedding # or azure_openai_embedding model: text-embedding-3-small

api_base: https://.openai.azure.com

# api_version: 2024-02-15-preview
# organization: <organization_id>
# deployment_name: <azure_model_deployment_name>
# tokens_per_minute: 150_000 # set a leaky bucket throttle
# requests_per_minute: 10_000 # set a leaky bucket throttle
# max_retries: 10
# max_retry_wait: 10.0
# sleep_on_rate_limit_recommendation: true # whether to sleep when azure suggests wait-times
# concurrent_requests: 25 # the number of parallel inflight requests that may be made
# batch_size: 16 # the number of documents to send in a single request
# batch_max_tokens: 8191 # the maximum number of tokens to send in a single request
# target: required # or optional

chunks: size: 300 overlap: 100 group_by_columns: [id] # by default, we don't allow chunks to cross documents

input: type: file # or blob file_type: text # or csv base_dir: "input" file_encoding: utf-8 file_pattern: ".*\.txt$"

cache: type: file # or blob base_dir: "cache"

connection_string:

container_name:

storage: type: file # or blob base_dir: "output/${timestamp}/artifacts"

connection_string:

container_name:

reporting: type: file # or console, blob base_dir: "output/${timestamp}/reports"

connection_string:

container_name:

entity_extraction:

llm: override the global llm settings for this task

parallelization: override the global parallelization settings for this task

async_mode: override the global async_mode settings for this task

prompt: "prompts/entity_extraction.txt" entity_types: [organization,person,geo,event] max_gleanings: 0

summarize_descriptions:

llm: override the global llm settings for this task

parallelization: override the global parallelization settings for this task

async_mode: override the global async_mode settings for this task

prompt: "prompts/summarize_descriptions.txt" max_length: 500

claim_extraction:

llm: override the global llm settings for this task

parallelization: override the global parallelization settings for this task

async_mode: override the global async_mode settings for this task

enabled: true

prompt: "prompts/claim_extraction.txt" description: "Any claims or facts that could be relevant to information discovery." max_gleanings: 0

community_report:

llm: override the global llm settings for this task

parallelization: override the global parallelization settings for this task

async_mode: override the global async_mode settings for this task

prompt: "prompts/community_report.txt" max_length: 2000 max_input_length: 8000

cluster_graph: max_cluster_size: 10

embed_graph: enabled: false # if true, will generate node2vec embeddings for nodes

num_walks: 10

walk_length: 40

window_size: 2

iterations: 3

random_seed: 597832

umap: enabled: false # if true, will generate UMAP embeddings for nodes

if we wanna graphml files as output -> turn graphml to true

snapshots: graphml: false raw_entities: false top_level_nodes: false

local_search:

text_unit_prop: 0.5

community_prop: 0.1

conversation_history_max_turns: 5

top_k_mapped_entities: 10

top_k_relationships: 10

max_tokens: 12000

global_search:

max_tokens: 12000

data_max_tokens: 12000

map_max_tokens: 1000

reduce_max_tokens: 2000

concurrency: 32

Logs and screenshots

{"type": "error", "data": "Error executing verb \"cluster_graph\" in create_base_entity_graph: EmptyNetworkError", "stack": "Traceback (most recent call last):\n File \"/opt/anaconda3/envs/graphrag/lib/python3.11/site-packages/datashaper/workflow/workflow.py\", line 410, in _execute_verb\n result = node.verb.func(verb_args)\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File \"/opt/anaconda3/envs/graphrag/lib/python3.11/site-packages/graphrag/index/verbs/graph/clustering/cluster_graph.py\", line 61, in cluster_graph\n results = output_df[column].apply(lambda graph: run_layout(strategy, graph))\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File \"/opt/anaconda3/envs/graphrag/lib/python3.11/site-packages/pandas/core/series.py\", line 4924, in apply\n ).apply()\n ^^^^^^^\n File \"/opt/anaconda3/envs/graphrag/lib/python3.11/site-packages/pandas/core/apply.py\", line 1427, in apply\n return self.apply_standard()\n ^^^^^^^^^^^^^^^^^^^^^\n File \"/opt/anaconda3/envs/graphrag/lib/python3.11/site-packages/pandas/core/apply.py\", line 1507, in apply_standard\n mapped = obj._map_values(\n ^^^^^^^^^^^^^^^^\n File \"/opt/anaconda3/envs/graphrag/lib/python3.11/site-packages/pandas/core/base.py\", line 921, in _map_values\n return algorithms.map_array(arr, mapper, na_action=na_action, convert=convert)\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File \"/opt/anaconda3/envs/graphrag/lib/python3.11/site-packages/pandas/core/algorithms.py\", line 1743, in map_array\n return lib.map_infer(values, mapper, convert=convert)\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File \"lib.pyx\", line 2972, in pandas._libs.lib.map_infer\n File \"/opt/anaconda3/envs/graphrag/lib/python3.11/site-packages/graphrag/index/verbs/graph/clustering/cluster_graph.py\", line 61, in \n results = output_df[column].apply(lambda graph: run_layout(strategy, graph))\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File \"/opt/anaconda3/envs/graphrag/lib/python3.11/site-packages/graphrag/index/verbs/graph/clustering/cluster_graph.py\", line 171, in run_layout\n clusters = run_leiden(graph, strategy)\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File \"/opt/anaconda3/envs/graphrag/lib/python3.11/site-packages/graphrag/index/verbs/graph/clustering/strategies/leiden.py\", line 26, in run\n node_id_to_community_map = _compute_leiden_communities(\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File \"/opt/anaconda3/envs/graphrag/lib/python3.11/site-packages/graphrag/index/verbs/graph/clustering/strategies/leiden.py\", line 61, in _compute_leiden_communities\n community_mapping = hierarchical_leiden(\n ^^^^^^^^^^^^^^^^^^^^\n File \"<@beartype(graspologic.partition.leiden.hierarchical_leiden) at 0x3233a4860>\", line 304, in hierarchical_leiden\n File \"/opt/anaconda3/envs/graphrag/lib/python3.11/site-packages/graspologic/partition/leiden.py\", line 588, in hierarchical_leiden\n hierarchical_clusters_native = gn.hierarchical_leiden(\n ^^^^^^^^^^^^^^^^^^^^^^^\nleiden.EmptyNetworkError: EmptyNetworkError\n", "source": "EmptyNetworkError", "details": null} {"type": "error", "data": "Error running pipeline!", "stack": "Traceback (most recent call last):\n File \"/opt/anaconda3/envs/graphrag/lib/python3.11/site-packages/graphrag/index/run.py\", line 323, in run_pipeline\n result = await workflow.run(context, callbacks)\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File \"/opt/anaconda3/envs/graphrag/lib/python3.11/site-packages/datashaper/workflow/workflow.py\", line 369, in run\n timing = await self._execute_verb(node, context, callbacks)\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File \"/opt/anaconda3/envs/graphrag/lib/python3.11/site-packages/datashaper/workflow/workflow.py\", line 410, in _execute_verb\n result = node.verb.func(verb_args)\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File \"/opt/anaconda3/envs/graphrag/lib/python3.11/site-packages/graphrag/index/verbs/graph/clustering/cluster_graph.py\", line 61, in cluster_graph\n results = output_df[column].apply(lambda graph: run_layout(strategy, graph))\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File \"/opt/anaconda3/envs/graphrag/lib/python3.11/site-packages/pandas/core/series.py\", line 4924, in apply\n ).apply()\n ^^^^^^^\n File \"/opt/anaconda3/envs/graphrag/lib/python3.11/site-packages/pandas/core/apply.py\", line 1427, in apply\n return self.apply_standard()\n ^^^^^^^^^^^^^^^^^^^^^\n File \"/opt/anaconda3/envs/graphrag/lib/python3.11/site-packages/pandas/core/apply.py\", line 1507, in apply_standard\n mapped = obj._map_values(\n ^^^^^^^^^^^^^^^^\n File \"/opt/anaconda3/envs/graphrag/lib/python3.11/site-packages/pandas/core/base.py\", line 921, in _map_values\n return algorithms.map_array(arr, mapper, na_action=na_action, convert=convert)\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File \"/opt/anaconda3/envs/graphrag/lib/python3.11/site-packages/pandas/core/algorithms.py\", line 1743, in map_array\n return lib.map_infer(values, mapper, convert=convert)\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File \"lib.pyx\", line 2972, in pandas._libs.lib.map_infer\n File \"/opt/anaconda3/envs/graphrag/lib/python3.11/site-packages/graphrag/index/verbs/graph/clustering/cluster_graph.py\", line 61, in \n results = output_df[column].apply(lambda graph: run_layout(strategy, graph))\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File \"/opt/anaconda3/envs/graphrag/lib/python3.11/site-packages/graphrag/index/verbs/graph/clustering/cluster_graph.py\", line 171, in run_layout\n clusters = run_leiden(graph, strategy)\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File \"/opt/anaconda3/envs/graphrag/lib/python3.11/site-packages/graphrag/index/verbs/graph/clustering/strategies/leiden.py\", line 26, in run\n node_id_to_community_map = _compute_leiden_communities(\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File \"/opt/anaconda3/envs/graphrag/lib/python3.11/site-packages/graphrag/index/verbs/graph/clustering/strategies/leiden.py\", line 61, in _compute_leiden_communities\n community_mapping = hierarchical_leiden(\n ^^^^^^^^^^^^^^^^^^^^\n File \"<@beartype(graspologic.partition.leiden.hierarchical_leiden) at 0x3233a4860>\", line 304, in hierarchical_leiden\n File \"/opt/anaconda3/envs/graphrag/lib/python3.11/site-packages/graspologic/partition/leiden.py\", line 588, in hierarchical_leiden\n hierarchical_clusters_native = gn.hierarchical_leiden(\n ^^^^^^^^^^^^^^^^^^^^^^^\nleiden.EmptyNetworkError: EmptyNetworkError\n", "source": "EmptyNetworkError", "details": null}

Additional Information

rushizirpe commented 1 month ago

The issue might be due to rate limiting being enforced by endpoints. Please check that the configuration file specifically max_tokens is supported by the model context length you are using.

Alternatively, If you want to use open-source models, I've created a repository for deploying Hugging Face models to local endpoints, offering functionality similar to OpenAI APIs. You can find the repo here: https://github.com/rushizirpe/open-llm-server

Also, I've prepared a Colab notebook for Graphrag Demo here: https://colab.research.google.com/drive/1uhFDnih1WKrSRQHisU-L6xw6coapgR51?usp=sharing. If you don't have access to GPUs like the A100, you'll need a GROQ_API_KEY (which is free with certain limitations), you can obtain it from: https://console.groq.com/keys

github-actions[bot] commented 1 month ago

This issue has been marked stale due to inactivity after repo maintainer or community member responses that request more information or suggest a solution. It will be closed after five additional days.

yueqianh commented 1 month ago

still encountering this issue!

crazyyanchao commented 1 month ago

I have same error:

18:13:39,527 graphrag.index.reporting.file_workflow_callbacks INFO Error executing verb "cluster_graph" in create_base_entity_graph: EmptyNetworkError details=None
18:13:39,527 graphrag.index.run ERROR error running workflow create_base_entity_graph
Traceback (most recent call last):
  File "D:\workspace\graphrag\index\run.py", line 323, in run_pipeline
    result = await workflow.run(context, callbacks)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\workspace\venv\Lib\site-packages\datashaper\workflow\workflow.py", line 369, in run
    timing = await self._execute_verb(node, context, callbacks)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\workspace\venv\Lib\site-packages\datashaper\workflow\workflow.py", line 410, in _execute_verb
    result = node.verb.func(**verb_args)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\workspace\graphrag\index\verbs\graph\clustering\cluster_graph.py", line 61, in cluster_graph
    results = output_df[column].apply(lambda graph: run_layout(strategy, graph))
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\workspace\venv\Lib\site-packages\pandas\core\series.py", line 4924, in apply
    ).apply()
      ^^^^^^^
  File "D:\workspace\venv\Lib\site-packages\pandas\core\apply.py", line 1427, in apply
    return self.apply_standard()
           ^^^^^^^^^^^^^^^^^^^^^
  File "D:\workspace\venv\Lib\site-packages\pandas\core\apply.py", line 1507, in apply_standard
    mapped = obj._map_values(
             ^^^^^^^^^^^^^^^^
  File "D:\workspace\venv\Lib\site-packages\pandas\core\base.py", line 921, in _map_values
    return algorithms.map_array(arr, mapper, na_action=na_action, convert=convert)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\workspace\venv\Lib\site-packages\pandas\core\algorithms.py", line 1743, in map_array
    return lib.map_infer(values, mapper, convert=convert)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "lib.pyx", line 2972, in pandas._libs.lib.map_infer
  File "D:\workspace\graphrag\index\verbs\graph\clustering\cluster_graph.py", line 61, in <lambda>
    results = output_df[column].apply(lambda graph: run_layout(strategy, graph))
                                                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\workspace\graphrag\index\verbs\graph\clustering\cluster_graph.py", line 167, in run_layout
    clusters = run_leiden(graph, strategy)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\workspace\graphrag\index\verbs\graph\clustering\strategies\leiden.py", line 26, in run
    node_id_to_community_map = _compute_leiden_communities(
                               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\workspace\graphrag\index\verbs\graph\clustering\strategies\leiden.py", line 61, in _compute_leiden_communities
    community_mapping = hierarchical_leiden(
                        ^^^^^^^^^^^^^^^^^^^^
  File "<@beartype(graspologic.partition.leiden.hierarchical_leiden) at 0x1a3bd57e0c0>", line 304, in hierarchical_leiden
  File "D:\workspace\venv\Lib\site-packages\graspologic\partition\leiden.py", line 588, in hierarchical_leiden
    hierarchical_clusters_native = gn.hierarchical_leiden(
                                   ^^^^^^^^^^^^^^^^^^^^^^^
leiden.EmptyNetworkError: EmptyNetworkError
18:13:39,531 graphrag.index.reporting.file_workflow_callbacks INFO Error running pipeline! details=None
natoverse commented 3 weeks ago

Can folks getting this error upload a full indexing engine.log? The EmptyNetwork error usually happens late in the pipeline and masks a very earlier error with entity extraction, such as an invalid key, incorrect permissions, etc.

I321065 commented 3 weeks ago

v0.3.0 same issue

Axiaozhu1 commented 3 weeks ago

收到此错误的人们可以上传完整的索引engine.log吗?EmptyNetwork 错误通常发生在管道的后期,并通过实体提取掩盖了非常早期的错误,例如无效的键、不正确的权限等。

11:10:20,172 graphrag.config.read_dotenv INFO Loading pipeline .env file 11:10:20,174 graphrag.index.cli INFO using default configuration: { "llm": { "api_key": "REDACTED, length 6", "type": "openai_chat", "model": "qwen2:1.5b", "max_tokens": 1024, "temperature": 0.0, "top_p": 1.0, "request_timeout": 180.0, "api_base": "http://localhost:11434/v1", "api_version": null, "proxy": null, "cognitive_services_endpoint": null, "deployment_name": null, "model_supports_json": true, "tokens_per_minute": 0, "requests_per_minute": 0, "max_retries": 10, "max_retry_wait": 10.0, "sleep_on_rate_limit_recommendation": true, "concurrent_requests": 25 }, "parallelization": { "stagger": 0.3, "num_threads": 50 }, "async_mode": "threaded", "root_dir": "./ragtest", "reporting": { "type": "file", "base_dir": "output/${timestamp}/reports", "storage_account_blob_url": null }, "storage": { "type": "file", "base_dir": "output/${timestamp}/artifacts", "storage_account_blob_url": null }, "cache": { "type": "file", "base_dir": "cache", "storage_account_blob_url": null }, "input": { "type": "file", "file_type": "text", "base_dir": "input", "storage_account_blob_url": null, "encoding": "utf-8", "file_pattern": ".\.txt$", "file_filter": null, "source_column": null, "timestamp_column": null, "timestamp_format": null, "text_column": "text", "title_column": null, "document_attribute_columns": [] }, "embed_graph": { "enabled": false, "num_walks": 10, "walk_length": 40, "window_size": 2, "iterations": 3, "random_seed": 597832, "strategy": null }, "embeddings": { "llm": { "api_key": "REDACTED, length 9", "type": "openai_embedding", "model": "nomic-ai/nomic-embed-text-v1.5-GGUF/nomic-embed-text-v1.5.Q5_K_M.gguf", "max_tokens": 4000, "temperature": 0, "top_p": 1, "request_timeout": 180.0, "api_base": "http://localhost:1234/v1", "api_version": null, "proxy": null, "cognitive_services_endpoint": null, "deployment_name": null, "model_supports_json": null, "tokens_per_minute": 0, "requests_per_minute": 0, "max_retries": 10, "max_retry_wait": 10.0, "sleep_on_rate_limit_recommendation": true, "concurrent_requests": 25 }, "parallelization": { "stagger": 0.3, "num_threads": 50 }, "async_mode": "threaded", "batch_size": 16, "batch_max_tokens": 8191, "target": "required", "skip": [], "vector_store": null, "strategy": null }, "chunks": { "size": 200, "overlap": 100, "group_by_columns": [ "id" ], "strategy": null }, "snapshots": { "graphml": true, "raw_entities": true, "top_level_nodes": true }, "entity_extraction": { "llm": { "api_key": "REDACTED, length 6", "type": "openai_chat", "model": "qwen2:1.5b", "max_tokens": 1024, "temperature": 0.0, "top_p": 1.0, "request_timeout": 180.0, "api_base": "http://localhost:11434/v1", "api_version": null, "proxy": null, "cognitive_services_endpoint": null, "deployment_name": null, "model_supports_json": true, "tokens_per_minute": 0, "requests_per_minute": 0, "max_retries": 10, "max_retry_wait": 10.0, "sleep_on_rate_limit_recommendation": true, "concurrent_requests": 25 }, "parallelization": { "stagger": 0.3, "num_threads": 50 }, "async_mode": "threaded", "prompt": "prompts/entity_extraction.txt", "entity_types": [ "organization", "person", "geo", "event" ], "max_gleanings": 0, "strategy": null }, "summarize_descriptions": { "llm": { "api_key": "REDACTED, length 6", "type": "openai_chat", "model": "qwen2:1.5b", "max_tokens": 1024, "temperature": 0.0, "top_p": 1.0, "request_timeout": 180.0, "api_base": "http://localhost:11434/v1", "api_version": null, "proxy": null, "cognitive_services_endpoint": null, "deployment_name": null, "model_supports_json": true, "tokens_per_minute": 0, "requests_per_minute": 0, "max_retries": 10, "max_retry_wait": 10.0, "sleep_on_rate_limit_recommendation": true, "concurrent_requests": 25 }, "parallelization": { "stagger": 0.3, "num_threads": 50 }, "async_mode": "threaded", "prompt": "prompts/summarize_descriptions.txt", "max_length": 500, "strategy": null }, "community_reports": { "llm": { "api_key": "REDACTED, length 6", "type": "openai_chat", "model": "qwen2:1.5b", "max_tokens": 1024, "temperature": 0.0, "top_p": 1.0, "request_timeout": 180.0, "api_base": "http://localhost:11434/v1", "api_version": null, "proxy": null, "cognitive_services_endpoint": null, "deployment_name": null, "model_supports_json": true, "tokens_per_minute": 0, "requests_per_minute": 0, "max_retries": 10, "max_retry_wait": 10.0, "sleep_on_rate_limit_recommendation": true, "concurrent_requests": 25 }, "parallelization": { "stagger": 0.3, "num_threads": 50 }, "async_mode": "threaded", "prompt": null, "max_length": 2000, "max_input_length": 8000, "strategy": null }, "claim_extraction": { "llm": { "api_key": "REDACTED, length 6", "type": "openai_chat", "model": "qwen2:1.5b", "max_tokens": 1024, "temperature": 0.0, "top_p": 1.0, "request_timeout": 180.0, "api_base": "http://localhost:11434/v1", "api_version": null, "proxy": null, "cognitive_services_endpoint": null, "deployment_name": null, "model_supports_json": true, "tokens_per_minute": 0, "requests_per_minute": 0, "max_retries": 10, "max_retry_wait": 10.0, "sleep_on_rate_limit_recommendation": true, "concurrent_requests": 25 }, "parallelization": { "stagger": 0.3, "num_threads": 50 }, "async_mode": "threaded", "enabled": false, "prompt": "prompts/claim_extraction.txt", "description": "Any claims or facts that could be relevant to information discovery.", "max_gleanings": 0, "strategy": null }, "cluster_graph": { "max_cluster_size": 10, "strategy": null }, "umap": { "enabled": false }, "local_search": { "text_unit_prop": 0.5, "community_prop": 0.1, "conversation_history_max_turns": 5, "top_k_entities": 10, "top_k_relationships": 10, "max_tokens": 12000, "llm_max_tokens": 2000 }, "global_search": { "temperature": 0.0, "top_p": 1.0, "max_tokens": 12000, "data_max_tokens": 12000, "map_max_tokens": 1000, "reduce_max_tokens": 2000, "concurrency": 32 }, "encoding_model": "cl100k_base", "skip_workflows": [] } 11:10:20,177 graphrag.index.create_pipeline_config INFO skipping workflows 11:10:20,177 graphrag.index.run INFO Running pipeline 11:10:20,177 graphrag.index.storage.file_pipeline_storage INFO Creating file storage at ragtest\output\20240815-111020\artifacts 11:10:20,178 graphrag.index.input.load_input INFO loading input from root_dir=input 11:10:20,178 graphrag.index.input.load_input INFO using file storage for input 11:10:20,180 graphrag.index.storage.file_pipeline_storage INFO search ragtest\input for files matching ..txt$ 11:10:20,180 graphrag.index.input.text INFO found text files from input, found [('CNN_intro.txt', {}), ('GNN_intro.txt', {}), ('machinelearning_intro.txt', {}), ('Transformers_intro.txt', {})] 11:10:20,188 graphrag.index.workflows.load INFO Workflow Run Order: ['create_base_text_units', 'create_base_extracted_entities', 'create_summarized_entities', 'create_base_entity_graph', 'create_final_entities', 'create_final_nodes', 'create_final_communities', 'join_text_units_to_entity_ids', 'create_final_relationships', 'join_text_units_to_relationship_ids', 'create_final_community_reports', 'create_final_text_units', 'create_base_documents', 'create_final_documents'] 11:10:20,188 graphrag.index.run INFO Final # of rows loaded: 4 11:10:20,268 graphrag.index.run INFO Running workflow: create_base_text_units... 11:10:20,268 graphrag.index.run INFO dependencies for create_base_text_units: [] 11:10:20,271 datashaper.workflow.workflow INFO executing verb orderby 11:10:20,277 datashaper.workflow.workflow INFO executing verb zip 11:10:20,280 datashaper.workflow.workflow INFO executing verb aggregate_override 11:10:20,286 datashaper.workflow.workflow INFO executing verb chunk 11:10:20,423 datashaper.workflow.workflow INFO executing verb select 11:10:20,427 datashaper.workflow.workflow INFO executing verb unroll 11:10:20,432 datashaper.workflow.workflow INFO executing verb rename 11:10:20,436 datashaper.workflow.workflow INFO executing verb genid 11:10:20,440 datashaper.workflow.workflow INFO executing verb unzip 11:10:20,444 datashaper.workflow.workflow INFO executing verb copy 11:10:20,448 datashaper.workflow.workflow INFO executing verb filter 11:10:20,458 graphrag.index.emit.parquet_table_emitter INFO emitting parquet table create_base_text_units.parquet 11:10:20,580 graphrag.index.run INFO Running workflow: create_base_extracted_entities... 11:10:20,580 graphrag.index.run INFO dependencies for create_base_extracted_entities: ['create_base_text_units'] 11:10:20,581 graphrag.index.run INFO read table from storage: create_base_text_units.parquet 11:10:20,609 datashaper.workflow.workflow INFO executing verb entity_extract 11:10:20,616 graphrag.llm.openai.create_openai_client INFO Creating OpenAI client base_url=http://localhost:11434/v1 11:10:20,724 graphrag.index.llm.load_llm INFO create TPM/RPM limiter for qwen2:1.5b: TPM=0, RPM=0 11:10:20,724 graphrag.index.llm.load_llm INFO create concurrency limiter for qwen2:1.5b: 25 11:10:26,827 httpx INFO HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK" 11:10:26,829 graphrag.llm.base.rate_limiting_llm INFO perf - llm.chat "Process" with 0 retries took 6.046999999998661. input_tokens=2134, output_tokens=34 11:10:26,840 httpx INFO HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK" 11:10:26,841 graphrag.llm.base.rate_limiting_llm INFO perf - llm.chat "Process" with 0 retries took 6.078000000001339. input_tokens=2134, output_tokens=33 11:10:28,800 httpx INFO HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK" 11:10:28,801 graphrag.llm.base.rate_limiting_llm INFO perf - llm.chat "Process" with 0 retries took 8.04700000000048. input_tokens=2134, output_tokens=35 11:10:33,48 httpx INFO HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK" 11:10:33,48 graphrag.llm.base.rate_limiting_llm INFO perf - llm.chat "Process" with 0 retries took 12.234000000000378. input_tokens=2134, output_tokens=145 11:10:35,750 httpx INFO HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK" 11:10:35,751 graphrag.llm.base.rate_limiting_llm INFO perf - llm.chat "Process" with 0 retries took 15.015999999999622. input_tokens=2134, output_tokens=42 11:10:36,679 httpx INFO HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK" 11:10:36,679 graphrag.llm.base.rate_limiting_llm INFO perf - llm.chat "Process" with 0 retries took 15.906000000000859. input_tokens=1934, output_tokens=135 11:10:43,840 httpx INFO HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK" 11:10:43,841 graphrag.llm.base.rate_limiting_llm INFO perf - llm.chat "Process" with 0 retries took 23.04700000000048. input_tokens=2134, output_tokens=151 11:10:44,588 httpx INFO HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK" 11:10:44,588 graphrag.llm.base.rate_limiting_llm INFO perf - llm.chat "Process" with 0 retries took 23.811999999999898. input_tokens=2134, output_tokens=146 11:10:46,574 httpx INFO HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK" 11:10:46,575 graphrag.llm.base.rate_limiting_llm INFO perf - llm.chat "Process" with 0 retries took 25.82799999999952. input_tokens=2134, output_tokens=409 11:10:51,484 httpx INFO HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK" 11:10:51,485 graphrag.llm.base.rate_limiting_llm INFO perf - llm.chat "Process" with 0 retries took 30.70299999999952. input_tokens=2134, output_tokens=140 11:10:52,605 httpx INFO HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK" 11:10:52,607 graphrag.llm.base.rate_limiting_llm INFO perf - llm.chat "Process" with 0 retries took 31.843000000000757. input_tokens=2134, output_tokens=118 11:10:54,617 httpx INFO HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK" 11:10:54,619 graphrag.llm.base.rate_limiting_llm INFO perf - llm.chat "Process" with 0 retries took 33.85900000000038. input_tokens=2134, output_tokens=36 11:10:57,226 httpx INFO HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK" 11:10:57,228 graphrag.llm.base.rate_limiting_llm INFO perf - llm.chat "Process" with 0 retries took 36.42200000000048. input_tokens=2134, output_tokens=259 11:10:58,469 httpx INFO HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK" 11:10:58,470 graphrag.llm.base.rate_limiting_llm INFO perf - llm.chat "Process" with 0 retries took 37.67200000000048. input_tokens=2134, output_tokens=121 11:10:59,955 httpx INFO HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK" 11:10:59,956 graphrag.llm.base.rate_limiting_llm INFO perf - llm.chat "Process" with 0 retries took 39.15599999999904. input_tokens=2134, output_tokens=32 11:11:01,300 httpx INFO HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK" 11:11:01,302 graphrag.llm.base.rate_limiting_llm INFO perf - llm.chat "Process" with 0 retries took 40.51599999999962. input_tokens=2082, output_tokens=34 11:11:02,814 httpx INFO HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK" 11:11:02,815 graphrag.llm.base.rate_limiting_llm INFO perf - llm.chat "Process" with 0 retries took 42.03199999999924. input_tokens=2134, output_tokens=35 11:11:03,261 httpx INFO HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK" 11:11:03,263 graphrag.llm.base.rate_limiting_llm INFO perf - llm.chat "Process" with 0 retries took 42.48499999999876. input_tokens=2134, output_tokens=126 11:11:04,302 httpx INFO HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK" 11:11:04,303 graphrag.llm.base.rate_limiting_llm INFO perf - llm.chat "Process" with 0 retries took 43.5. input_tokens=2134, output_tokens=36 11:11:05,473 httpx INFO HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK" 11:11:05,474 graphrag.llm.base.rate_limiting_llm INFO perf - llm.chat "Process" with 0 retries took 44.71900000000096. input_tokens=2134, output_tokens=36 11:11:08,637 httpx INFO HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK" 11:11:08,639 graphrag.llm.base.rate_limiting_llm INFO perf - llm.chat "Process" with 0 retries took 47.82799999999952. input_tokens=2134, output_tokens=79 11:11:09,216 httpx INFO HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK" 11:11:09,218 graphrag.llm.base.rate_limiting_llm INFO perf - llm.chat "Process" with 0 retries took 48.46900000000096. input_tokens=2134, output_tokens=752 11:11:10,935 httpx INFO HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK" 11:11:10,936 graphrag.llm.base.rate_limiting_llm INFO perf - llm.chat "Process" with 0 retries took 44.11000000000058. input_tokens=2134, output_tokens=29 11:11:12,893 httpx INFO HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK" 11:11:12,893 graphrag.llm.base.rate_limiting_llm INFO perf - llm.chat "Process" with 0 retries took 46.04699999999866. input_tokens=2134, output_tokens=34 11:11:13,747 httpx INFO HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK" 11:11:13,749 graphrag.llm.base.rate_limiting_llm INFO perf - llm.chat "Process" with 0 retries took 52.9369999999999. input_tokens=2134, output_tokens=151 11:11:15,849 httpx INFO HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK" 11:11:15,850 graphrag.llm.base.rate_limiting_llm INFO perf - llm.chat "Process" with 0 retries took 42.79700000000048. input_tokens=2134, output_tokens=36 11:11:17,982 httpx INFO HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK" 11:11:17,983 graphrag.llm.base.rate_limiting_llm INFO perf - llm.chat "Process" with 0 retries took 42.23400000000038. input_tokens=2134, output_tokens=37 11:11:19,593 httpx INFO HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK" 11:11:19,595 graphrag.llm.base.rate_limiting_llm INFO perf - llm.chat "Process" with 0 retries took 58.78100000000086. input_tokens=2052, output_tokens=292 11:11:21,778 httpx INFO HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK" 11:11:21,780 graphrag.llm.base.rate_limiting_llm INFO perf - llm.chat "Process" with 0 retries took 52.98400000000038. input_tokens=2134, output_tokens=152 11:11:21,943 httpx INFO HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK" 11:11:21,944 graphrag.llm.base.rate_limiting_llm INFO perf - llm.chat "Process" with 0 retries took 38.09399999999914. input_tokens=1959, output_tokens=29 11:11:26,257 httpx INFO HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK" 11:11:26,260 graphrag.llm.base.rate_limiting_llm INFO perf - llm.chat "Process" with 0 retries took 65.42200000000048. input_tokens=2134, output_tokens=338 11:11:27,188 httpx INFO HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK" 11:11:27,190 graphrag.llm.base.rate_limiting_llm INFO perf - llm.chat "Process" with 0 retries took 50.51599999999962. input_tokens=2134, output_tokens=198 11:11:27,207 datashaper.workflow.workflow INFO executing verb snapshot 11:11:27,216 datashaper.workflow.workflow INFO executing verb merge_graphs 11:11:27,236 datashaper.workflow.workflow INFO executing verb snapshot_rows 11:11:27,240 graphrag.index.emit.parquet_table_emitter INFO emitting parquet table create_base_extracted_entities.parquet 11:11:27,425 graphrag.index.run INFO Running workflow: create_summarized_entities... 11:11:27,425 graphrag.index.run INFO dependencies for create_summarized_entities: ['create_base_extracted_entities'] 11:11:27,426 graphrag.index.run INFO read table from storage: create_base_extracted_entities.parquet 11:11:27,454 datashaper.workflow.workflow INFO executing verb summarize_descriptions 11:11:30,49 httpx INFO HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK" 11:11:30,50 graphrag.llm.base.rate_limiting_llm INFO perf - llm.chat "summarize" with 0 retries took 2.5310000000008586. input_tokens=156, output_tokens=38 11:11:30,971 httpx INFO HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK" 11:11:30,972 graphrag.llm.base.rate_limiting_llm INFO perf - llm.chat "summarize" with 0 retries took 3.485000000000582. input_tokens=188, output_tokens=62 11:11:32,47 httpx INFO HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK" 11:11:32,48 graphrag.llm.base.rate_limiting_llm INFO perf - llm.chat "summarize" with 0 retries took 4.57799999999952. input_tokens=212, output_tokens=82 11:11:33,386 httpx INFO HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK" 11:11:33,388 graphrag.llm.base.rate_limiting_llm INFO perf - llm.chat "summarize" with 0 retries took 5.890999999999622. input_tokens=189, output_tokens=64 11:11:34,0 httpx INFO HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK" 11:11:34,1 graphrag.llm.base.rate_limiting_llm INFO perf - llm.chat "summarize" with 0 retries took 6.5. input_tokens=169, output_tokens=38 11:11:35,63 httpx INFO HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK" 11:11:35,65 graphrag.llm.base.rate_limiting_llm INFO perf - llm.chat "summarize" with 0 retries took 7.5789999999997235. input_tokens=167, output_tokens=86 11:11:35,142 httpx INFO HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK" 11:11:35,144 graphrag.llm.base.rate_limiting_llm INFO perf - llm.chat "summarize" with 0 retries took 7.640999999999622. input_tokens=247, output_tokens=147 11:11:35,165 datashaper.workflow.workflow INFO executing verb snapshot_rows 11:11:35,172 graphrag.index.emit.parquet_table_emitter INFO emitting parquet table create_summarized_entities.parquet 11:11:35,367 graphrag.index.run INFO Running workflow: create_base_entity_graph... 11:11:35,367 graphrag.index.run INFO dependencies for create_base_entity_graph: ['create_summarized_entities'] 11:11:35,368 graphrag.index.run INFO read table from storage: create_summarized_entities.parquet 11:11:35,399 datashaper.workflow.workflow INFO executing verb cluster_graph 11:11:35,410 datashaper.workflow.workflow ERROR Error executing verb "cluster_graph" in create_base_entity_graph: EmptyNetworkError Traceback (most recent call last): File "D:\software\Anaconda3\envs\graphrag\lib\site-packages\datashaper\workflow\workflow.py", line 410, in _execute_verb result = node.verb.func(verb_args) File "E:\learn\graphrag\graphrag-local-ollama-main\graphrag\index\verbs\graph\clustering\cluster_graph.py", line 61, in cluster_graph results = output_df[column].apply(lambda graph: run_layout(strategy, graph)) File "D:\software\Anaconda3\envs\graphrag\lib\site-packages\pandas\core\series.py", line 4924, in apply ).apply() File "D:\software\Anaconda3\envs\graphrag\lib\site-packages\pandas\core\apply.py", line 1427, in apply return self.apply_standard() File "D:\software\Anaconda3\envs\graphrag\lib\site-packages\pandas\core\apply.py", line 1507, in apply_standard mapped = obj._map_values( File "D:\software\Anaconda3\envs\graphrag\lib\site-packages\pandas\core\base.py", line 921, in _map_values return algorithms.map_array(arr, mapper, na_action=na_action, convert=convert) File "D:\software\Anaconda3\envs\graphrag\lib\site-packages\pandas\core\algorithms.py", line 1743, in map_array return lib.map_infer(values, mapper, convert=convert) File "lib.pyx", line 2972, in pandas._libs.lib.map_infer File "E:\learn\graphrag\graphrag-local-ollama-main\graphrag\index\verbs\graph\clustering\cluster_graph.py", line 61, in results = output_df[column].apply(lambda graph: run_layout(strategy, graph)) File "E:\learn\graphrag\graphrag-local-ollama-main\graphrag\index\verbs\graph\clustering\cluster_graph.py", line 167, in run_layout clusters = run_leiden(graph, strategy) File "E:\learn\graphrag\graphrag-local-ollama-main\graphrag\index\verbs\graph\clustering\strategies\leiden.py", line 26, in run node_id_to_community_map = _compute_leiden_communities( File "E:\learn\graphrag\graphrag-local-ollama-main\graphrag\index\verbs\graph\clustering\strategies\leiden.py", line 61, in _compute_leiden_communities community_mapping = hierarchical_leiden( File "<@beartype(graspologic.partition.leiden.hierarchical_leiden) at 0x21ab70e1510>", line 304, in hierarchical_leiden File "D:\software\Anaconda3\envs\graphrag\lib\site-packages\graspologic\partition\leiden.py", line 588, in hierarchical_leiden hierarchical_clusters_native = gn.hierarchical_leiden( leiden.EmptyNetworkError: EmptyNetworkError 11:11:35,433 graphrag.index.reporting.file_workflow_callbacks INFO Error executing verb "cluster_graph" in create_base_entity_graph: EmptyNetworkError details=None 11:11:35,433 graphrag.index.run ERROR error running workflow create_base_entity_graph Traceback (most recent call last): File "E:\learn\graphrag\graphrag-local-ollama-main\graphrag\index\run.py", line 323, in run_pipeline result = await workflow.run(context, callbacks) File "D:\software\Anaconda3\envs\graphrag\lib\site-packages\datashaper\workflow\workflow.py", line 369, in run timing = await self._execute_verb(node, context, callbacks) File "D:\software\Anaconda3\envs\graphrag\lib\site-packages\datashaper\workflow\workflow.py", line 410, in _execute_verb result = node.verb.func(verb_args) File "E:\learn\graphrag\graphrag-local-ollama-main\graphrag\index\verbs\graph\clustering\cluster_graph.py", line 61, in cluster_graph results = output_df[column].apply(lambda graph: run_layout(strategy, graph)) File "D:\software\Anaconda3\envs\graphrag\lib\site-packages\pandas\core\series.py", line 4924, in apply ).apply() File "D:\software\Anaconda3\envs\graphrag\lib\site-packages\pandas\core\apply.py", line 1427, in apply return self.apply_standard() File "D:\software\Anaconda3\envs\graphrag\lib\site-packages\pandas\core\apply.py", line 1507, in apply_standard mapped = obj._map_values( File "D:\software\Anaconda3\envs\graphrag\lib\site-packages\pandas\core\base.py", line 921, in _map_values return algorithms.map_array(arr, mapper, na_action=na_action, convert=convert) File "D:\software\Anaconda3\envs\graphrag\lib\site-packages\pandas\core\algorithms.py", line 1743, in map_array return lib.map_infer(values, mapper, convert=convert) File "lib.pyx", line 2972, in pandas._libs.lib.map_infer File "E:\learn\graphrag\graphrag-local-ollama-main\graphrag\index\verbs\graph\clustering\cluster_graph.py", line 61, in results = output_df[column].apply(lambda graph: run_layout(strategy, graph)) File "E:\learn\graphrag\graphrag-local-ollama-main\graphrag\index\verbs\graph\clustering\cluster_graph.py", line 167, in run_layout clusters = run_leiden(graph, strategy) File "E:\learn\graphrag\graphrag-local-ollama-main\graphrag\index\verbs\graph\clustering\strategies\leiden.py", line 26, in run node_id_to_community_map = _compute_leiden_communities( File "E:\learn\graphrag\graphrag-local-ollama-main\graphrag\index\verbs\graph\clustering\strategies\leiden.py", line 61, in _compute_leiden_communities community_mapping = hierarchical_leiden( File "<@beartype(graspologic.partition.leiden.hierarchical_leiden) at 0x21ab70e1510>", line 304, in hierarchical_leiden File "D:\software\Anaconda3\envs\graphrag\lib\site-packages\graspologic\partition\leiden.py", line 588, in hierarchical_leiden hierarchical_clusters_native = gn.hierarchical_leiden( leiden.EmptyNetworkError: EmptyNetworkError 11:11:35,437 graphrag.index.reporting.file_workflow_callbacks INFO Error running pipeline! details=None

github-actions[bot] commented 2 weeks ago

This issue has been marked stale due to inactivity after repo maintainer or community member responses that request more information or suggest a solution. It will be closed after five additional days.

github-actions[bot] commented 1 week ago

This issue has been closed after being marked as stale for five days. Please reopen if needed.