Open xiangjingwei123 opened 3 weeks ago
Please inspect the indexing-engine.log. Often this error is preceded by errors earlier in the pipeline, usually due to OpenAI key issues such as permissions or missing config.
indexing-engine.log
17:35:14,702 graphrag.index.emit.parquet_table_emitter INFO emitting parquet table create_base_text_units.parquet
17:35:15,63 graphrag.index.run INFO Running workflow: create_base_extracted_entities...
17:35:15,69 graphrag.index.run INFO dependencies for create_base_extracted_entities: ['create_base_text_units']
17:35:15,81 graphrag.index.run INFO read table from storage: create_base_text_units.parquet
17:35:15,130 datashaper.workflow.workflow INFO executing verb entity_extract
17:35:15,141 graphrag.llm.openai.create_openai_client INFO Creating OpenAI client base_url=None
17:35:15,164 graphrag.index.llm.load_llm INFO create TPM/RPM limiter for gpt-4o: TPM=0, RPM=0
17:35:15,164 graphrag.index.llm.load_llm INFO create concurrency limiter for gpt-4o: 25
17:37:22,715 graphrag.index.reporting.file_workflow_callbacks INFO Error Invoking LLM details={'input': '\n-Goal-\nGiven a text document that is potentially relevant to this activity and a list of entity types, identify all entities of those types from the text and all relationships among the identified entities.\n \n-Steps-\n1. Identify all entities. For each identified entity, extract the following information:\n- entity_name: Name of the entity, capitalized\n- entity_type: One of the following types: [organization,person,geo,event]\n- entity_description: Comprehensive description of the entity\'s attributes and activities\nFormat each entity as ("entity"<|>
Please inspect the indexing-engine.log. Often this error is preceded by errors earlier in the pipeline, usually due to OpenAI key issues such as permissions or missing config.
It looks like there are connection issues with the OpenAI library:
The above exception was the direct cause of the following exception:
Traceback (most recent call last): File "/usr/local/lib/python3.10/site-packages/graphrag/index/graph/extractors/graph/graph_extractor.py", line 123, in call result = await self._process_document(text, prompt_variables) File "/usr/local/lib/python3.10/site-packages/graphrag/index/graph/extractors/graph/graph_extractor.py", line 151, in _process_document response = await self._llm( File "/usr/local/lib/python3.10/site-packages/graphrag/llm/openai/json_parsing_llm.py", line 34, in call result = await self._delegate(input, kwargs) File "/usr/local/lib/python3.10/site-packages/graphrag/llm/openai/openai_token_replacing_llm.py", line 37, in call return await self._delegate(input, kwargs) File "/usr/local/lib/python3.10/site-packages/graphrag/llm/openai/openai_history_tracking_llm.py", line 33, in call output = await self._delegate(input, kwargs) File "/usr/local/lib/python3.10/site-packages/graphrag/llm/base/caching_llm.py", line 96, in call result = await self._delegate(input, kwargs) File "/usr/local/lib/python3.10/site-packages/graphrag/llm/base/rate_limiting_llm.py", line 177, in call result, start = await execute_with_retry() File "/usr/local/lib/python3.10/site-packages/graphrag/llm/base/rate_limiting_llm.py", line 159, in execute_with_retry async for attempt in retryer: File "/usr/local/lib/python3.10/site-packages/tenacity/asyncio/init.py", line 166, in anext do = await self.iter(retry_state=self._retry_state) File "/usr/local/lib/python3.10/site-packages/tenacity/asyncio/init.py", line 153, in iter result = await action(retry_state) File "/usr/local/lib/python3.10/site-packages/tenacity/_utils.py", line 99, in inner return call(*args, kwargs) File "/usr/local/lib/python3.10/site-packages/tenacity/init.py", line 418, in exc_check raise retry_exc.reraise() File "/usr/local/lib/python3.10/site-packages/tenacity/init.py", line 185, in reraise raise self.last_attempt.result() File "/usr/local/lib/python3.10/concurrent/futures/_base.py", line 451, in result return self.get_result() File "/usr/local/lib/python3.10/concurrent/futures/_base.py", line 403, in get_result raise self._exception File "/usr/local/lib/python3.10/site-packages/graphrag/llm/base/rate_limiting_llm.py", line 165, in execute_with_retry return await do_attempt(), start File "/usr/local/lib/python3.10/site-packages/graphrag/llm/base/rate_limiting_llm.py", line 147, in do_attempt return await self._delegate(input, kwargs) File "/usr/local/lib/python3.10/site-packages/graphrag/llm/base/base_llm.py", line 49, in call return await self._invoke(input, kwargs) File "/usr/local/lib/python3.10/site-packages/graphrag/llm/base/base_llm.py", line 53, in _invoke output = await self._execute_llm(input, kwargs) File "/usr/local/lib/python3.10/site-packages/graphrag/llm/openai/openai_chat_llm.py", line 53, in _execute_llm completion = await self.client.chat.completions.create( File "/usr/local/lib/python3.10/site-packages/openai/resources/chat/completions.py", line 1339, in create return await self._post( File "/usr/local/lib/python3.10/site-packages/openai/_base_client.py", line 1816, in post return await self.request(cast_to, opts, stream=stream, stream_cls=stream_cls) File "/usr/local/lib/python3.10/site-packages/openai/_base_client.py", line 1510, in request return await self._request( File "/usr/local/lib/python3.10/site-packages/openai/_base_client.py", line 1583, in _request raise APIConnectionError(request=request) from err openai.APIConnectionError: Connection error.
I don't see any other reporting on the underlying issue that would help with diagnostics (i.e., internet connectivity, key validity, etc.), so the best suggestion at the moment would be to see if you have any difficulty connecting via their API directly and in the playground, etc. to confirm that your setup with them is valid.
Facing the same issue currently. Have validated my key and internet connection by connecting to the API directly. Logs:
12:33:15,301 graphrag.config.read_dotenv INFO Loading pipeline .env file
12:33:15,304 graphrag.index.cli INFO using default configuration: {
"llm": {
"api_key": "REDACTED, length 56",
"type": "openai_chat",
"model": "gpt-4o-mini",
"max_tokens": 4000,
"temperature": 0.0,
"top_p": 1.0,
"n": 1,
"request_timeout": 180.0,
"api_base": null,
"api_version": null,
"proxy": null,
"cognitive_services_endpoint": null,
"deployment_name": null,
"model_supports_json": true,
"tokens_per_minute": 0,
"requests_per_minute": 0,
"max_retries": 10,
"max_retry_wait": 10.0,
"sleep_on_rate_limit_recommendation": true,
"concurrent_requests": 25
},
"parallelization": {
"stagger": 0.3,
"num_threads": 50
},
"async_mode": "threaded",
"root_dir": ".",
"reporting": {
"type": "file",
"base_dir": "output/${timestamp}/reports",
"storage_account_blob_url": null
},
"storage": {
"type": "file",
"base_dir": "output/${timestamp}/artifacts",
"storage_account_blob_url": null
},
"cache": {
"type": "file",
"base_dir": "cache",
"storage_account_blob_url": null
},
"input": {
"type": "file",
"file_type": "text",
"base_dir": "input",
"storage_account_blob_url": null,
"encoding": "utf-8",
"file_pattern": ".*\\.txt$",
"file_filter": null,
"source_column": null,
"timestamp_column": null,
"timestamp_format": null,
"text_column": "text",
"title_column": null,
"document_attribute_columns": []
},
"embed_graph": {
"enabled": false,
"num_walks": 10,
"walk_length": 40,
"window_size": 2,
"iterations": 3,
"random_seed": 597832,
"strategy": null
},
"embeddings": {
"llm": {
"api_key": "REDACTED, length 56",
"type": "openai_embedding",
"model": "text-embedding-3-small",
"max_tokens": 4000,
"temperature": 0,
"top_p": 1,
"n": 1,
"request_timeout": 180.0,
"api_base": null,
"api_version": null,
"proxy": null,
"cognitive_services_endpoint": null,
"deployment_name": null,
"model_supports_json": null,
"tokens_per_minute": 0,
"requests_per_minute": 0,
"max_retries": 10,
"max_retry_wait": 10.0,
"sleep_on_rate_limit_recommendation": true,
"concurrent_requests": 25
},
"parallelization": {
"stagger": 0.3,
"num_threads": 50
},
"async_mode": "threaded",
"batch_size": 16,
"batch_max_tokens": 8191,
"target": "required",
"skip": [],
"vector_store": null,
"strategy": null
},
"chunks": {
"size": 1200,
"overlap": 100,
"group_by_columns": [
"id"
],
"strategy": null
},
"snapshots": {
"graphml": false,
"raw_entities": false,
"top_level_nodes": false
},
"entity_extraction": {
"llm": {
"api_key": "REDACTED, length 56",
"type": "openai_chat",
"model": "gpt-4o-mini",
"max_tokens": 4000,
"temperature": 0.0,
"top_p": 1.0,
"n": 1,
"request_timeout": 180.0,
"api_base": null,
"api_version": null,
"proxy": null,
"cognitive_services_endpoint": null,
"deployment_name": null,
"model_supports_json": true,
"tokens_per_minute": 0,
"requests_per_minute": 0,
"max_retries": 10,
"max_retry_wait": 10.0,
"sleep_on_rate_limit_recommendation": true,
"concurrent_requests": 25
},
"parallelization": {
"stagger": 0.3,
"num_threads": 50
},
"async_mode": "threaded",
"prompt": "prompts/entity_extraction.txt",
"entity_types": [
"organization",
"person",
"geo",
"event"
],
"max_gleanings": 1,
"strategy": null
},
"summarize_descriptions": {
"llm": {
"api_key": "REDACTED, length 56",
"type": "openai_chat",
"model": "gpt-4o-mini",
"max_tokens": 4000,
"temperature": 0.0,
"top_p": 1.0,
"n": 1,
"request_timeout": 180.0,
"api_base": null,
"api_version": null,
"proxy": null,
"cognitive_services_endpoint": null,
"deployment_name": null,
"model_supports_json": true,
"tokens_per_minute": 0,
"requests_per_minute": 0,
"max_retries": 10,
"max_retry_wait": 10.0,
"sleep_on_rate_limit_recommendation": true,
"concurrent_requests": 25
},
"parallelization": {
"stagger": 0.3,
"num_threads": 50
},
"async_mode": "threaded",
"prompt": "prompts/summarize_descriptions.txt",
"max_length": 500,
"strategy": null
},
"community_reports": {
"llm": {
"api_key": "REDACTED, length 56",
"type": "openai_chat",
"model": "gpt-4o-mini",
"max_tokens": 4000,
"temperature": 0.0,
"top_p": 1.0,
"n": 1,
"request_timeout": 180.0,
"api_base": null,
"api_version": null,
"proxy": null,
"cognitive_services_endpoint": null,
"deployment_name": null,
"model_supports_json": true,
"tokens_per_minute": 0,
"requests_per_minute": 0,
"max_retries": 10,
"max_retry_wait": 10.0,
"sleep_on_rate_limit_recommendation": true,
"concurrent_requests": 25
},
"parallelization": {
"stagger": 0.3,
"num_threads": 50
},
"async_mode": "threaded",
"prompt": "prompts/community_report.txt",
"max_length": 2000,
"max_input_length": 8000,
"strategy": null
},
"claim_extraction": {
"llm": {
"api_key": "REDACTED, length 56",
"type": "openai_chat",
"model": "gpt-4o-mini",
"max_tokens": 4000,
"temperature": 0.0,
"top_p": 1.0,
"n": 1,
"request_timeout": 180.0,
"api_base": null,
"api_version": null,
"proxy": null,
"cognitive_services_endpoint": null,
"deployment_name": null,
"model_supports_json": true,
"tokens_per_minute": 0,
"requests_per_minute": 0,
"max_retries": 10,
"max_retry_wait": 10.0,
"sleep_on_rate_limit_recommendation": true,
"concurrent_requests": 25
},
"parallelization": {
"stagger": 0.3,
"num_threads": 50
},
"async_mode": "threaded",
"enabled": false,
"prompt": "prompts/claim_extraction.txt",
"description": "Any claims or facts that could be relevant to information discovery.",
"max_gleanings": 1,
"strategy": null
},
"cluster_graph": {
"max_cluster_size": 10,
"strategy": null
},
"umap": {
"enabled": false
},
"local_search": {
"text_unit_prop": 0.5,
"community_prop": 0.1,
"conversation_history_max_turns": 5,
"top_k_entities": 10,
"top_k_relationships": 10,
"temperature": 0.0,
"top_p": 1.0,
"n": 1,
"max_tokens": 12000,
"llm_max_tokens": 2000
},
"global_search": {
"temperature": 0.0,
"top_p": 1.0,
"n": 1,
"max_tokens": 12000,
"data_max_tokens": 12000,
"map_max_tokens": 1000,
"reduce_max_tokens": 2000,
"concurrency": 32
},
"encoding_model": "cl100k_base",
"skip_workflows": []
}
12:33:15,306 graphrag.index.create_pipeline_config INFO skipping workflows
12:33:15,318 graphrag.index.run INFO Running pipeline
12:33:15,318 graphrag.index.storage.file_pipeline_storage INFO Creating file storage at output/20240822-123315/artifacts
12:33:15,319 graphrag.index.input.load_input INFO loading input from root_dir=input
12:33:15,319 graphrag.index.input.load_input INFO using file storage for input
12:33:15,319 graphrag.index.storage.file_pipeline_storage INFO search input for files matching .*\.txt$
12:33:15,320 graphrag.index.input.text INFO found text files from input, found [('.txt', {})]
12:33:15,321 graphrag.index.input.text INFO Found 1 files, loading 1
12:33:15,322 graphrag.index.workflows.load INFO Workflow Run Order: ['create_base_text_units', 'create_base_extracted_entities', 'create_summarized_entities', 'create_base_entity_graph', 'create_final_entities', 'create_final_nodes', 'create_final_communities', 'join_text_units_to_entity_ids', 'create_final_relationships', 'join_text_units_to_relationship_ids', 'create_final_community_reports', 'create_final_text_units', 'create_base_documents', 'create_final_documents']
12:33:15,322 graphrag.index.run INFO Final # of rows loaded: 1
12:33:15,432 graphrag.index.run INFO Running workflow: create_base_text_units...
12:33:15,432 graphrag.index.run INFO dependencies for create_base_text_units: []
12:33:15,435 datashaper.workflow.workflow INFO executing verb orderby
12:33:15,437 datashaper.workflow.workflow INFO executing verb zip
12:33:15,439 datashaper.workflow.workflow INFO executing verb aggregate_override
12:33:15,444 datashaper.workflow.workflow INFO executing verb chunk
12:33:15,605 datashaper.workflow.workflow INFO executing verb select
12:33:15,607 datashaper.workflow.workflow INFO executing verb unroll
12:33:15,611 datashaper.workflow.workflow INFO executing verb rename
12:33:15,615 datashaper.workflow.workflow INFO executing verb genid
12:33:15,619 datashaper.workflow.workflow INFO executing verb unzip
12:33:15,623 datashaper.workflow.workflow INFO executing verb copy
12:33:15,626 datashaper.workflow.workflow INFO executing verb filter
12:33:15,635 graphrag.index.emit.parquet_table_emitter INFO emitting parquet table create_base_text_units.parquet
12:33:15,763 graphrag.index.run INFO Running workflow: create_base_extracted_entities...
12:33:15,764 graphrag.index.run INFO dependencies for create_base_extracted_entities: ['create_base_text_units']
12:33:15,764 graphrag.index.run INFO read table from storage: create_base_text_units.parquet
12:33:15,774 datashaper.workflow.workflow INFO executing verb entity_extract
12:33:15,776 graphrag.llm.openai.create_openai_client INFO Creating OpenAI client base_url=None
12:33:15,808 graphrag.index.llm.load_llm INFO create TPM/RPM limiter for gpt-4o-mini: TPM=0, RPM=0
12:33:15,808 graphrag.index.llm.load_llm INFO create concurrency limiter for gpt-4o-mini: 25
12:33:16,983 httpx INFO HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
12:33:16,988 graphrag.llm.base.rate_limiting_llm INFO perf - llm.chat "Process" with 0 retries took 1.1759999999776483. input_tokens=1935, output_tokens=5
12:33:17,641 httpx INFO HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
12:33:17,643 graphrag.llm.base.rate_limiting_llm INFO perf - llm.chat "extract-continuation-0" with 0 retries took 0.651999999769032. input_tokens=19, output_tokens=19
12:33:17,659 datashaper.workflow.workflow INFO executing verb merge_graphs
12:33:17,663 graphrag.index.emit.parquet_table_emitter INFO emitting parquet table create_base_extracted_entities.parquet
12:33:17,787 graphrag.index.run INFO Running workflow: create_summarized_entities...
12:33:17,787 graphrag.index.run INFO dependencies for create_summarized_entities: ['create_base_extracted_entities']
12:33:17,788 graphrag.index.run INFO read table from storage: create_base_extracted_entities.parquet
12:33:17,798 datashaper.workflow.workflow INFO executing verb summarize_descriptions
12:33:17,800 graphrag.index.emit.parquet_table_emitter INFO emitting parquet table create_summarized_entities.parquet
12:33:17,926 graphrag.index.run INFO Running workflow: create_base_entity_graph...
12:33:17,926 graphrag.index.run INFO dependencies for create_base_entity_graph: ['create_summarized_entities']
12:33:17,926 graphrag.index.run INFO read table from storage: create_summarized_entities.parquet
12:33:17,937 datashaper.workflow.workflow INFO executing verb cluster_graph
12:33:17,937 graphrag.index.verbs.graph.clustering.cluster_graph WARNING Graph has no nodes
12:33:17,940 datashaper.workflow.workflow ERROR Error executing verb "cluster_graph" in create_base_entity_graph: Columns must be same length as key
Traceback (most recent call last):
File "/opt/anaconda3/envs/myenv/lib/python3.12/site-packages/datashaper/workflow/workflow.py", line 410, in _execute_verb
result = node.verb.func(**verb_args)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/kanishktyagi/Kanishk_POC/graphrag/graphrag/graphrag/index/verbs/graph/clustering/cluster_graph.py", line 102, in cluster_graph
output_df[[level_to, to]] = pd.DataFrame(
~~~~~~~~~^^^^^^^^^^^^^^^^
File "/opt/anaconda3/envs/myenv/lib/python3.12/site-packages/pandas/core/frame.py", line 4299, in __setitem__
self._setitem_array(key, value)
File "/opt/anaconda3/envs/myenv/lib/python3.12/site-packages/pandas/core/frame.py", line 4341, in _setitem_array
check_key_length(self.columns, key, value)
File "/opt/anaconda3/envs/myenv/lib/python3.12/site-packages/pandas/core/indexers/utils.py", line 390, in check_key_length
raise ValueError("Columns must be same length as key")
ValueError: Columns must be same length as key
12:33:17,946 graphrag.index.reporting.file_workflow_callbacks INFO Error executing verb "cluster_graph" in create_base_entity_graph: Columns must be same length as key details=None
12:33:17,946 graphrag.index.run ERROR error running workflow create_base_entity_graph
Traceback (most recent call last):
File "/Users/kanishktyagi/Kanishk_POC/graphrag/graphrag/graphrag/index/run.py", line 323, in run_pipeline
result = await workflow.run(context, callbacks)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/anaconda3/envs/myenv/lib/python3.12/site-packages/datashaper/workflow/workflow.py", line 369, in run
timing = await self._execute_verb(node, context, callbacks)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/anaconda3/envs/myenv/lib/python3.12/site-packages/datashaper/workflow/workflow.py", line 410, in _execute_verb
result = node.verb.func(**verb_args)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/kanishktyagi/Kanishk_POC/graphrag/graphrag/graphrag/index/verbs/graph/clustering/cluster_graph.py", line 102, in cluster_graph
output_df[[level_to, to]] = pd.DataFrame(
~~~~~~~~~^^^^^^^^^^^^^^^^
File "/opt/anaconda3/envs/myenv/lib/python3.12/site-packages/pandas/core/frame.py", line 4299, in __setitem__
self._setitem_array(key, value)
File "/opt/anaconda3/envs/myenv/lib/python3.12/site-packages/pandas/core/frame.py", line 4341, in _setitem_array
check_key_length(self.columns, key, value)
File "/opt/anaconda3/envs/myenv/lib/python3.12/site-packages/pandas/core/indexers/utils.py", line 390, in check_key_length
raise ValueError("Columns must be same length as key")
ValueError: Columns must be same length as key
12:33:17,947 graphrag.index.reporting.file_workflow_callbacks INFO Error running pipeline! details=None
@Kanishk-T I see the following line in your log.
12:33:17,937 graphrag.index.verbs.graph.clustering.cluster_graph WARNING Graph has no nodes
This means that graphrag was not able to extract any entities and/or relationships from the data you are trying to index. We could improve the error handling here as this is a known edge case.
I also encountered the same issue, which happened after I upgraded to version 0.3.1. There were significant changes in this version, and many settings need to be configured in the env.
I've also encountered this issue after upgrading GraphRAG to the latest version. The problem appears to be due to significant changes in the prompts.
To resolve this, I replaced the new prompts with the older version that was previously working for me.
@jgbradley1 I've tried the example in the documentation after creating a fresh project and it is also met with this exact same error. As mentioned by others in the thread this is happening after the update and is not a product of using any which type of input data or models other than openAI rather due to the changes made in the last update. Again I can try to pinpoint what is causing it as I faced this same error here during the prompt tuning process and have made a PR for the same:https://github.com/microsoft/graphrag/pull/925
Same thing there due to the way the prompt was generating the examples the graph was coming out to be empty same as now.
@Kanishk-T, @9prodhi, @allseeworld I've merged #925
Haven't cut a new release to pypi but, can you please run from source to check if this solves the issue you're facing? You'll need to rerun prompt tuning for this change to take effect
Hey @AlonsoGuevara I've made the exact same changes to the default prompts as well as tested the changes on prompt tuning. Ran it on a default project and again simply removing the asterisks(**) from either side of the {record_delimiter}
seems to have fixed the problem with the indexing pipeline returning the empty graph error.
What's your context window size? I have the same issue with ollama and qwen2. But I found that the default num_ctx=2048 is too small to produce the right response. After I set the num_ctx=32000, it works.
Even after modifying the prompt by removing the *
character, the Mistral model is still not functioning as expected. Specifically, the model is failing to extract any edges for the generated graph.
*
character.encoding_model: cl100k_base
skip_workflows: []
llm:
api_key: ${GRAPHRAG_API_KEY}
type: openai_chat
model: gemma2 # mistral gemma2
model_supports_json: true
# api_base: http://host.docker.internal:11434/v1
api_base: http://localhost:11434/v1
# api_base: http://127.0.0.1:7002/v1
concurrent_requests: 24
parallelization:
stagger: 120
async_mode: threaded
embeddings:
async_mode: threaded
llm:
api_key: ${GRAPHRAG_API_KEY}
type: openai_embedding
model: nomic-ai/nomic-embed-text-v1.5-GGUF
# api_base: http://localhost:8001/v1/
# api_base: http://44.200.78.20/v1/
api_base: http://localhost:8001/v1
concurrent_requests: 2
chunks:
size: 300
overlap: 100
group_by_columns: [id]
input:
type: file
file_type: text
base_dir: "input"
file_encoding: utf-8
file_pattern: ".*\\.txt$"
cache:
type: file
base_dir: "cache"
storage:
type: file
base_dir: "output/${timestamp}/artifacts"
reporting:
type: file
base_dir: "output/${timestamp}/reports"
entity_extraction:
prompt: "prompts/entity_extraction.txt"
entity_types: [organization,person,geo,event, Paper, Journal, Conference,Citation, Research Topic]
max_gleanings: 0
summarize_descriptions:
prompt: "prompts/summarize_descriptions.txt"
max_length: 500
claim_extraction:
prompt: "prompts/claim_extraction.txt"
description: "Any claims or facts that could be relevant to information discovery."
max_gleanings: 0
community_report:
prompt: "prompts/community_report.txt"
max_length: 2000
max_input_length: 8000
cluster_graph:
max_cluster_size: 10
embed_graph:
enabled: false
umap:
enabled: false
snapshots:
graphml: true
raw_entities: true
top_level_nodes: false
hierarchical_clusters_native = gn.hierarchical_leiden(
^^^^^^^^^^^^^^^^^^^^^^^
leiden.EmptyNetworkError: EmptyNetworkError
09:15:29,615 graphrag.index.reporting.file_workflow_callbacks INFO Error running pipeline! details=None
09:15:29,623 graphrag.index.cli ERROR Errors occurred during the pipeline run, see logs for more details.
What's your context window size? I have the same issue with ollama and qwen2. But I found that the default num_ctx=2048 is too small to produce the right response. After I set the num_ctx=32000, it works.您的上下文窗口大小是多少?我对 ollama 和 qwen2 也有同样的问题。但我发现默认的 num_ctx=2048 太小,无法产生正确的响应。我设置 num_ctx=32000 后,它就起作用了。
param num_ctx means llm.max_tokens?
any solutions to this issue? have the same issue while running Ollama (llama3.1)
Do you need to file an issue?
Describe the issue
{"type": "error", "data": "Error executing verb \"cluster_graph\" in create_base_entity_graph: Columns must be same length as key", "stack": "Traceback (most recent call last):\n File \"/data/jupyter/myenv/lib/python3.10/site-packages/datashaper/workflow/workflow.py\", line 410, in _execute_verb\n result = node.verb.func(verb_args)\n File \"/data/jupyter/myenv/lib/python3.10/site-packages/graphrag/index/verbs/graph/clustering/cluster_graph.py\", line 102, in cluster_graph\n output_df[[level_to, to]] = pd.DataFrame(\n File \"/data/jupyter/myenv/lib/python3.10/site-packages/pandas/core/frame.py\", line 4299, in setitem\n self._setitem_array(key, value)\n File \"/data/jupyter/myenv/lib/python3.10/site-packages/pandas/core/frame.py\", line 4341, in _setitem_array\n check_key_length(self.columns, key, value)\n File \"/data/jupyter/myenv/lib/python3.10/site-packages/pandas/core/indexers/utils.py\", line 390, in check_key_length\n raise ValueError(\"Columns must be same length as key\")\nValueError: Columns must be same length as key\n", "source": "Columns must be same length as key", "details": null}{"type": "error", "data": "Error running pipeline!", "stack": "Traceback (most recent call last):\n File \"/data/jupyter/myenv/lib/python3.10/site-packages/graphrag/index/run.py\", line 325, in run_pipeline\n result = await workflow.run(context, callbacks)\n File \"/data/jupyter/myenv/lib/python3.10/site-packages/datashaper/workflow/workflow.py\", line 369, in run\n timing = await self._execute_verb(node, context, callbacks)\n File \"/data/jupyter/myenv/lib/python3.10/site-packages/datashaper/workflow/workflow.py\", line 410, in _execute_verb\n result = node.verb.func(verb_args)\n File \"/data/jupyter/myenv/lib/python3.10/site-packages/graphrag/index/verbs/graph/clustering/cluster_graph.py\", line 102, in cluster_graph\n output_df[[level_to, to]] = pd.DataFrame(\n File \"/data/jupyter/myenv/lib/python3.10/site-packages/pandas/core/frame.py\", line 4299, in setitem\n self._setitem_array(key, value)\n File \"/data/jupyter/myenv/lib/python3.10/site-packages/pandas/core/frame.py\", line 4341, in _setitem_array\n check_key_length(self.columns, key, value)\n File \"/data/jupyter/myenv/lib/python3.10/site-packages/pandas/core/indexers/utils.py\", line 390, in check_key_length\n raise ValueError(\"Columns must be same length as key\")\nValueError: Columns must be same length as key\n", "source": "Columns must be same length as key", "details": null}
Steps to reproduce
after i excute 'python -m graphrag.index --root ./ragtest' , the failed happend.
GraphRAG Config Used
Logs and screenshots
No response
Additional Information