Errors occurred during the pipeline run, in create_base_entity_graph

un-lock-me commented 1 month ago

Do you need to file an issue?

[X] I have searched the existing issues and this bug is not already filed.
[ ] My model is hosted on OpenAI or Azure. If not, please look at the "model providers" issue and don't file a new one here.
[ ] I believe this is a legitimate bug, not just a question. If this is a question, please use the Discussions area.

Describe the issue

When I run the model: python -m graphrag.index --root ./ragtest it raises error in this step: ⠴ GraphRAG Indexer ├── Loading Input (text) - 1 files loaded (1 filtered) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:00 0:00:00 ├── create_base_text_units ├── create_base_extracted_entities ├── create_summarized_entities └── create_base_entity_graph

with this error:

❌ Errors occurred during the pipeline run, see logs for more details.

I can share the content of log but it seems it is related to timeout. I dont understand why I get timeout though. I am using LM studio and

Appreciate your help!

Steps to reproduce

No response

GraphRAG Config Used

# Paste your config here
encoding_model: cl100k_base
skip_workflows: []
llm:
  api_key: ${GRAPHRAG_API_KEY}
  type: openai_chat # or azure_openai_chat
  model: lmstudio-community/Meta-Llama-3.1-8B-Instruct-GGUF
  model_supports_json: true # recommended if this is available for your model.
  max_tokens: 32768
  # request_timeout: 180.0
  api_base: http://localhost:1234/v1
  # api_version: 2024-02-15-preview
  # organization: <organization_id>
  # deployment_name: <azure_model_deployment_name>
  # tokens_per_minute: 150_000 # set a leaky bucket throttle
  # requests_per_minute: 10_000 # set a leaky bucket throttle
  # max_retries: 10
  # max_retry_wait: 10.0
  # sleep_on_rate_limit_recommendation: true # whether to sleep when azure suggests wait-times
  # concurrent_requests: 25 # the number of parallel inflight requests that may be made
  # temperature: 0 # temperature for sampling
  # top_p: 1 # top-p sampling
  # n: 1 # Number of completions to generate

parallelization:
  stagger: 0.3
  num_threads: 1 # the number of threads to use for parallel processing

async_mode: threaded # or asyncio

embeddings:
  ## parallelization: override the global parallelization settings for embeddings
  async_mode: threaded # or asyncio
  llm:
    api_key: ${GRAPHRAG_API_KEY}
    type: openai_embedding # or azure_openai_embedding
    model: "CompendiumLabs/bge-large-en-v1.5-gguf"
    api_base: http://localhost:1234/v1
    # api_version: 2024-02-15-preview
    # organization: <organization_id>
    # deployment_name: <azure_model_deployment_name>
    # tokens_per_minute: 150_000 # set a leaky bucket throttle
    # requests_per_minute: 10_000 # set a leaky bucket throttle
    # max_retries: 10
    # max_retry_wait: 10.0
    # sleep_on_rate_limit_recommendation: true # whether to sleep when azure suggests wait-times
    concurrent_requests: 1 # the number of parallel inflight requests that may be made
    #batch_size: 16 # the number of documents to send in a single request
    # batch_max_tokens: 8191 # the maximum number of tokens to send in a single request
    # target: required # or optional

chunks:
  size: 100
  overlap: 30
  group_by_columns: [id] # by default, we don't allow chunks to cross documents

input:
  type: file # or blob
  file_type: text # or csv
  base_dir: "input"
  file_encoding: utf-8
  file_pattern: ".*\\.txt$"

cache:
  type: file # or blob
  base_dir: "cache"
  # connection_string: <azure_blob_storage_connection_string>
  # container_name: <azure_blob_storage_container_name>

storage:
  type: file # or blob
  base_dir: "output/${timestamp}/artifacts"
  # connection_string: <azure_blob_storage_connection_string>
  # container_name: <azure_blob_storage_container_name>

reporting:
  type: file # or console, blob
  base_dir: "output/${timestamp}/reports"
  # connection_string: <azure_blob_storage_connection_string>
  # container_name: <azure_blob_storage_container_name>

entity_extraction:
  ## llm: override the global llm settings for this task
  ## parallelization: override the global parallelization settings for this task
  ## async_mode: override the global async_mode settings for this task
  prompt: "prompts/entity_extraction.txt"
  entity_types: [organization,person,geo,event]
  max_gleanings: 1

summarize_descriptions:
  ## llm: override the global llm settings for this task
  ## parallelization: override the global parallelization settings for this task
  ## async_mode: override the global async_mode settings for this task
  prompt: "prompts/summarize_descriptions.txt"
  max_length: 500

claim_extraction:
  ## llm: override the global llm settings for this task
  ## parallelization: override the global parallelization settings for this task
  ## async_mode: override the global async_mode settings for this task
  enabled: false  # warning: setting this to true added 16 hours to the run!
  prompt: "prompts/claim_extraction.txt"
  description: "Any claims or facts that could be relevant to information discovery."
  max_gleanings: 1

community_reports:
  ## llm: override the global llm settings for this task
  ## parallelization: override the global parallelization settings for this task
  ## async_mode: override the global async_mode settings for this task
  prompt: "prompts/community_report.txt"
  max_length: 2000
  max_input_length: 8000

cluster_graph:
  max_cluster_size: 10

embed_graph:
  enabled: false # if true, will generate node2vec embeddings for nodes
  # num_walks: 10
  # walk_length: 40
  # window_size: 2
  # iterations: 3
  # random_seed: 597832

umap:
  enabled: false # if true, will generate UMAP embeddings for nodes

snapshots:
  graphml: false
  raw_entities: false
  top_level_nodes: false

local_search:
  # text_unit_prop: 0.5
  # community_prop: 0.1
  # conversation_history_max_turns: 5
  # top_k_mapped_entities: 10
  # top_k_relationships: 10
  # llm_temperature: 0 # temperature for sampling
  # llm_top_p: 1 # top-p sampling
  # llm_n: 1 # Number of completions to generate
  # max_tokens: 12000

global_search:
  # llm_temperature: 0 # temperature for sampling
  # llm_top_p: 1 # top-p sampling
  # llm_n: 1 # Number of completions to generate
  # max_tokens: 12000
  # data_max_tokens: 12000
  # map_max_tokens: 1000
  # reduce_max_tokens: 2000
  # concurrency: 32

Logs and screenshots

I can share the whole logs if needed but it seems that it is getting timeout error:

{ "type": "error", "data": "Error Invoking LLM", "stack": "Traceback (most recent call last):\n File \"/Users/sgoudarzvand/.pyenv/versions/myenv2/lib/python3.10/site-packages/httpx/_transports/default.py\", line 72, in map_httpcore_exceptions\n yield\n File \"/Users/sgoudarzvand/.pyenv/versions/myenv2/lib/python3.10/site-packages/httpx/_transports/default.py\", line 377, in handle_async_request\n resp = await self._pool.handle_async_request(req)\n File \"/Users/sgoudarzvand/.pyenv/versions/myenv2/lib/python3.10/site-packages/httpcore/_async/connection_pool.py\", line 216, in handle_async_request\n raise exc from None\n File \"/Users/sgoudarzvand/.pyenv/versions/myenv2/lib/python3.10/site-packages/httpcore/_async/connection_pool.py\", line 196, in handle_async_request\n response = await connection.handle_async_request(\n File \"/Users/sgoudarzvand/.pyenv/versions/myenv2/lib/python3.10/site-packages/httpcore/_async/connection.py\", line 101, in handle_async_request\n return await self._connection.handle_async_request(request)\n File \"/Users/sgoudarzvand/.pyenv/versions/myenv2/lib/python3.10/site-packages/httpcore/_async/http11.py\", line 143, in handle_async_request\n raise exc\n File \"/Users/sgoudarzvand/.pyenv/versions/myenv2/lib/python3.10/site-packages/httpcore/_async/http11.py\", line 113, in handle_async_request\n ) = await self._receive_response_headers(kwargs)\n File \"/Users/sgoudarzvand/.pyenv/versions/myenv2/lib/python3.10/site-packages/httpcore/_async/http11.py\", line 186, in _receive_response_headers\n event = await self._receive_event(timeout=timeout)\n File \"/Users/sgoudarzvand/.pyenv/versions/myenv2/lib/python3.10/site-packages/httpcore/_async/http11.py\", line 224, in _receive_event\n data = await self._network_stream.read(\n File \"/Users/sgoudarzvand/.pyenv/versions/myenv2/lib/python3.10/site-packages/httpcore/_backends/anyio.py\", line 32, in read\n with map_exceptions(exc_map):\n File \"/Users/sgoudarzvand/.pyenv/versions/3.10.13/lib/python3.10/contextlib.py\", line 153, in exit\n self.gen.throw(typ, value, traceback)\n File \"/Users/sgoudarzvand/.pyenv/versions/myenv2/lib/python3.10/site-packages/httpcore/_exceptions.py\", line 14, in map_exceptions\n raise to_exc(exc) from exc\nhttpcore.ReadTimeout\n\nThe above exception was the direct cause of the following exception:\n\nTraceback (most recent call last):\n File \"/Users/sgoudarzvand/.pyenv/versions/myenv2/lib/python3.10/site-packages/openai/_base_client.py\", line 1564, in _request\n response = await self._client.send(\n File \"/Users/sgoudarzvand/.pyenv/versions/myenv2/lib/python3.10/site-packages/httpx/_client.py\", line 1674, in send\n response = await self._send_handling_auth(\n File \"/Users/sgoudarzvand/.pyenv/versions/myenv2/lib/python3.10/site-packages/httpx/_client.py\", line 1702, in _send_handling_auth\n response = await self._send_handling_redirects(\n File \"/Users/sgoudarzvand/.pyenv/versions/myenv2/lib/python3.10/site-packages/httpx/_client.py\", line 1739, in _send_handling_redirects\n response = await self._send_single_request(request)\n File \"/Users/sgoudarzvand/.pyenv/versions/myenv2/lib/python3.10/site-packages/httpx/_client.py\", line 1776, in _send_single_request\n response = await transport.handle_async_request(request)\n File \"/Users/sgoudarzvand/.pyenv/versions/myenv2/lib/python3.10/site-packages/httpx/_transports/default.py\", line 376, in handle_async_request\n with map_httpcore_exceptions():\n File \"/Users/sgoudarzvand/.pyenv/versions/3.10.13/lib/python3.10/contextlib.py\", line 153, in exit\n self.gen.throw(typ, value, traceback)\n File \"/Users/sgoudarzvand/.pyenv/versions/myenv2/lib/python3.10/site-packages/httpx/_transports/default.py\", line 89, in map_httpcore_exceptions\n raise mapped_exc(message) from exc\nhttpx.ReadTimeout\n\nThe above exception was the direct cause of the following exception:\n\nTraceback (most recent call last):\n File \"/Users/sgoudarzvand/.pyenv/versions/myenv2/lib/python3.10/site-packages/graphrag/llm/base/base_llm.py\", line 54, in _invoke\n output = await self._execute_llm(input, kwargs)\n File \"/Users/sgoudarzvand/.pyenv/versions/myenv2/lib/python3.10/site-packages/graphrag/llm/openai/openai_chat_llm.py\", line 53, in _execute_llm\n completion = await self.client.chat.completions.create(\n File \"/Users/sgoudarzvand/.pyenv/versions/myenv2/lib/python3.10/site-packages/openai/resources/chat/completions.py\", line 1490, in create\n return await self._post(\n File \"/Users/sgoudarzvand/.pyenv/versions/myenv2/lib/python3.10/site-packages/openai/_base_client.py\", line 1831, in post\n return await self.request(cast_to, opts, stream=stream, stream_cls=stream_cls)\n File \"/Users/sgoudarzvand/.pyenv/versions/myenv2/lib/python3.10/site-packages/openai/_base_client.py\", line 1525, in request\n return await self._request(\n File \"/Users/sgoudarzvand/.pyenv/versions/myenv2/lib/python3.10/site-packages/openai/_base_client.py\", line 1583, in _request\n raise APITimeoutError(request=request) from err\nopenai.APITimeoutError: Request timed out.\n", "source": "Request timed out.", "details": { "input": "\n-Goal-\nGiven a text document that is potentially relevant to this activity and a list of entity types, identify all entities of those types from the text and all relationships among the identified entities.\n \n-Steps-\n1. Identify all entities. For each identified entity, extract the following information:\n- entity_name: Name of the entity, capitalized\n- entity_type: One of the following types: [organization,person,geo,event]\n- entity_description: Comprehensive description of the entity's attributes and activities\nFormat each entity as (\"entity\"<|><|><|>)\n \n2. From the entities identified in step 1, identify all pairs of (source_entity, target_entity) that are clearly related to each other.\nFor each pair of related entities, extract the following information:\n- source_entity: name of the source entity, as identified in step 1\n- target_entity: name of the target entity, as identified in step 1\n- relationship_description: explanation as to why you think the source entity and the target entity are related to each other\n- relationship_strength: a numeric score indicating strength of the relationship between the source entity and target entity\n Format each relationship as (\"relationship\"<|><|><|><|>)\n \n3. Return output in English as a single list of all the entities and relationships identified in steps 1 and 2. Use ## as the list delimiter.\n \n4. When finished, output <|COMPLETE|>\n \n######################\n-Examples-\n######################\nExample 1:\nEntity_types: ORGANIZATION,PERSON\nText:\nThe Verdantis's Central Institution is scheduled to meet on Monday and Thursday, with the institution planning to release its latest policy decision on Thursday at 1:30 p.m. PDT, followed by a press conference where Central Institution Chair Martin Smith will take questions. Investors expect the Market Strategy Committee to hold its benchmark interest rate steady in a range of 3.5%-3.75%.\n######################\nOutput:\n(\"entity\"<|>CENTRAL INSTITUTION<|>ORGANIZATION<|>The Central Institution is the Federal Reserve of Verdantis, which is setting interest rates on Monday and Thursday)\n##\n(\"entity\"<|>MARTIN SMITH<|>PERSON<|>Martin Smith is the chair of the Central Institution)\n##\n(\"entity\"<|>MARKET STRATEGY COMMITTEE<|>ORGANIZATION<|>The Central Institution committee makes key decisions about interest rates and the growth of Verdantis's money supply)\n##\n(\"relationship\"<|>MARTIN SMITH<|>CENTRAL INSTITUTION<|>Martin Smith is the Chair of the Central Institution and will answer questions at a press conference<|>9)\n<|COMPLETE|>\n\n######################\nExample 2:\nEntity_types: ORGANIZATION\nText:\nTechGlobal's (TG) stock skyrocketed in its opening day on the Global Exchange Thursday. But IPO experts warn that the semiconductor corporation's debut on the public markets isn't indicative of how other newly listed companies may perform.\n\nTechGlobal, a formerly public company, was taken private by Vision Holdings in 2014. The well-established chip designer says it powers 85% of premium smartphones.\n######################\nOutput:\n(\"entity\"<|>TECHGLOBAL<|>ORGANIZATION<|>TechGlobal is a stock now listed on the Global Exchange which powers 85% of premium smartphones)\n##\n(\"entity\"<|>VISION HOLDINGS<|>ORGANIZATION<|>Vision Holdings is a firm that previously owned TechGlobal)\n##\n(\"relationship\"<|>TECHGLOBAL<|>VISION HOLDINGS<|>Vision Holdings formerly owned TechGlobal from 2014 until present<|>5)\n<|COMPLETE|>\n\n######################\nExample 3:\nEntity_types: ORGANIZATION,GEO,PERSON\nText:\nFive Aurelians jailed for 8 years in Firuzabad and widely regarded as hostages are on their way home to Aurelia.\n\nThe swap orchestrated by Quintara was finalized when $8bn of Firuzi funds were transferred to financial institutions in Krohaara, the capital of Quintara.\n\nThe exchange initiated in Firuzabad's capital, Tiruzia, led to the four men and one woman, who are also Firuzi nationals, boarding a chartered flight to Krohaara.\n\nThey were welcomed by senior Aurelian officials and are now on their way to Aurelia's capital, Cashion.\n\nThe Aurelians include 39-year-old businessman Samuel Namara, who has been held in Tiruzia's Alhamia Prison, as well as journalist Durke Bataglani, 59, and environmentalist Meggie Tazbah, 53, who also holds Bratinas nationality.\n######################\nOutput:\n(\"entity\"<|>FIRUZABAD<|>GEO<|>Firuzabad held Aurelians as hostages)\n##\n(\"entity\"<|>AURELIA<|>GEO<|>Country seeking to release hostages)\n##\n(\"entity\"<|>QUINTARA<|>GEO<|>Country that negotiated a swap of money in exchange for hostages)\n##\n##\n(\"entity\"<|>TIRUZIA<|>GEO<|>Capital of Firuzabad where the Aurelians were being held)\n##\n(\"entity\"<|>KROHAARA<|>GEO<|>Capital city in Quintara)\n##\n(\"entity\"<|>CASHION<|>GEO<|>Capital city in Aurelia)\n##\n(\"entity\"<|>SAMUEL NAMARA<|>PERSON<|>Aurelian who spent time in Tiruzia's Alhamia Prison)\n##\n(\"entity\"<|>ALHAMIA PRISON<|>GEO<|>Prison in Tiruzia)\n##\n(\"entity\"<|>DURKE BATAGLANI<|>PERSON<|>Aurelian journalist who was held hostage)\n##\n(\"entity\"<|>MEGGIE TAZBAH<|>PERSON<|>Bratinas national and environmentalist who was held hostage)\n##\n(\"relationship\"<|>FIRUZABAD<|>AURELIA<|>Firuzabad negotiated a hostage exchange with Aurelia<|>2)\n##\n(\"relationship\"<|>QUINTARA<|>AURELIA<|>Quintara brokered the hostage exchange between Firuzabad and Aurelia<|>2)\n##\n(\"relationship\"<|>QUINTARA<|>FIRUZABAD<|>Quintara brokered the hostage exchange between Firuzabad and Aurelia<|>2)\n##\n(\"relationship\"<|>SAMUEL NAMARA<|>ALHAMIA PRISON<|>Samuel Namara was a prisoner at Alhamia prison<|>8)\n##\n(\"relationship\"<|>SAMUEL NAMARA<|>MEGGIE TAZBAH<|>Samuel Namara and Meggie Tazbah were exchanged in the same hostage release<|>2)\n##\n(\"relationship\"<|>SAMUEL NAMARA<|>DURKE BATAGLANI<|>Samuel Namara and Durke Bataglani were exchanged in the same hostage release<|>2)\n##\n(\"relationship\"<|>MEGGIE TAZBAH<|>DURKE BATAGLANI<|>Meggie Tazbah and Durke Bataglani were exchanged in the same hostage release<|>2)\n##\n(\"relationship\"<|>SAMUEL NAMARA<|>FIRUZABAD<|>Samuel Namara was a hostage in Firuzabad<|>2)\n##\n(\"relationship\"<|>MEGGIE TAZBAH<|>FIRUZABAD<|>Meggie Tazbah was a hostage in Firuzabad<|>2)\n##\n(\"relationship\"<|>DURKE BATAGLANI<|>FIRUZABAD<|>Durke Bataglani was a hostage in Firuzabad<|>2)\n<|COMPLETE|>\n\n######################\n-Real Data-\n######################\nEntity_types: organization,person,geo,event\nText: The Project Gutenberg eBook of A Christmas Carol \nThis ebook is for the use of anyone anywhere in the United States and\nmost other parts of the world at no cost and with almost no restrictions\nwhatsoever. You may copy it, give it away or re-use it under the terms\nof the Project Gutenberg License included with this ebook or online\nat www.gutenberg.org. If you are not located in the United States,\nyou will have to check the laws of the country where you are located\n######################\nOutput:" } } { "type": "error", "data": "Error Invoking LLM", "stack": "Traceback (most recent call last):\n File \"/Users/sgoudarzvand/.pyenv/versions/myenv2/lib/python3.10/site-packages/httpx/_transports/default.py\", line 72, in map_httpcore_exceptions\n yield\n File \"/Users/sgoudarzvand/.pyenv/versions/myenv2/lib/python3.10/site-packages/httpx/_transports/default.py\", line 377, in handle_async_request\n resp = await self._pool.handle_async_request(req)\n File \"/Users/sgoudarzvand/.pyenv/versions/myenv2/lib/python3.10/site-packages/httpcore/_async/connection_pool.py\", line 216, in handle_async_request\n raise exc from None\n File \"/Users/sgoudarzvand/.pyenv/versions/myenv2/lib/python3.10/site-packages/httpcore/_async/connection_pool.py\", line 196, in handle_async_request\n response = await connection.handle_async_request(\n File \"/Users/sgoudarzvand/.pyenv/versions/myenv2/lib/python3.10/site-packages/httpcore/_async/connection.py\", line 101, in handle_async_request\n return await self._connection.handle_async_request(request)\n File \"/Users/sgoudarzvand/.pyenv/versions/myenv2/lib/python3.10/site-packages/httpcore/_async/http11.py\", line 143, in handle_async_request\n raise exc\n File \"/Users/sgoudarzvand/.pyenv/versions/myenv2/lib/python3.10/site-packages/httpcore/_async/http11.py\", line 113, in handle_async_request\n ) = await self._receive_response_headers(kwargs)\n File \"/Users/sgoudarzvand/.pyenv/versions/myenv2/lib/python3.10/site-packages/httpcore/_async/http11.py\", line 186, in _receive_response_headers\n event = await self._receive_event(timeout=timeout)\n File \"/Users/sgoudarzvand/.pyenv/versions/myenv2/lib/python3.10/site-packages/httpcore/_async/http11.py\", line 224, in _receive_event\n data = await self._network_stream.read(\n File \"/Users/sgoudarzvand/.pyenv/versions/myenv2/lib/python3.10/site-packages/httpcore/_backends/anyio.py\", line 32, in read\n with map_exceptions(exc_map):\n File \"/Users/sgoudarzvand/.pyenv/versions/3.10.13/lib/python3.10/contextlib.py\", line 153, in exit\n self.gen.throw(typ, value, traceback)\n File \"/Users/sgoudarzvand/.pyenv/versions/myenv2/lib/python3.10/site-packages/httpcore/_exceptions.py\", line 14, in map_exceptions\n raise to_exc(exc) from exc\nhttpcore.ReadTimeout\n\nThe above exception was the direct cause of the following exception:\n\nTraceback (most recent call last):\n File \"/Users/sgoudarzvand/.pyenv/versions/myenv2/lib/python3.10/site-packages/openai/_base_client.py\", line 1564, in _request\n response = await self._client.send(\n File \"/Users/sgoudarzvand/.pyenv/versions/myenv2/lib/python3.10/site-packages/httpx/_client.py\", line 1674, in send\n response = await self._send_handling_auth(\n File \"/Users/sgoudarzvand/.pyenv/versions/myenv2/lib/python3.10/site-packages/httpx/_client.py\", line 1702, in _send_handling_auth\n response = await self._send_handling_redirects(\n File \"/Users/sgoudarzvand/.pyenv/versions/myenv2/lib/python3.10/site-packages/httpx/_client.py\", line 1739, in _send_handling_redirects\n response = await self._send_single_request(request)\n File \"/Users/sgoudarzvand/.pyenv/versions/myenv2/lib/python3.10/site-packages/httpx/_client.py\", line 1776, in _send_single_request\n response = await transport.handle_async_request(request)\n File \"/Users/sgoudarzvand/.pyenv/versions/myenv2/lib/python3.10/site-packages/httpx/_transports/default.py\", line 376, in handle_async_request\n with map_httpcore_exceptions():\n File \"/Users/sgoudarzvand/.pyenv/versions/3.10.13/lib/python3.10/contextlib.py\", line 153, in exit\n self.gen.throw(typ, value, traceback)\n File \"/Users/sgoudarzvand/.pyenv/versions/myenv2/lib/python3.10/site-packages/httpx/_transports/default.py\", line 89, in map_httpcore_exceptions\n raise mapped_exc(message) from exc\nhttpx.ReadTimeout\n\nThe above exception was the direct cause of the following exception:\n\nTraceback (most recent call last):\n File \"/Users/sgoudarzvand/.pyenv/versions/myenv2/lib/python3.10/site-packages/graphrag/llm/base/base_llm.py\", line 54, in _invoke\n output = await self._execute_llm(input, kwargs)\n File \"/Users/sgoudarzvand/.pyenv/versions/myenv2/lib/python3.10/site-packages/graphrag/llm/openai/openai_chat_llm.py\", line 53, in _execute_llm\n completion = await self.client.chat.completions.create(\n File \"/Users/sgoudarzvand/.pyenv/versions/myenv2/lib/python3.10/site-packages/openai/resources/chat/completions.py\", line 1490, in create\n return await self._post(\n File \"/Users/sgoudarzvand/.pyenv/versions/myenv2/lib/python3.10/site-packages/openai/_base_client.py\", line 1831, in post\n return await self.request(cast_to, opts, stream=stream, stream_cls=stream_cls)\n File \"/Users/sgoudarzvand/.pyenv/versions/myenv2/lib/python3.10/site-packages/openai/_base_client.py\", line 1525, in request\n return await self._request(\n File \"/Users/sgoudarzvand/.pyenv/versions/myenv2/lib/python3.10/site-packages/openai/_base_client.py\", line 1583, in _request\n raise APITimeoutError(request=request) from err\nopenai.APITimeoutError: Request timed out.\n", "source": "Request timed out.", "details": { "input": "\n-Goal-\nGiven a text document that is potentially relevant to this activity and a list of entity types, identify all entities of those types from the text and all relationships among the identified entities.\n \n-Steps-\n1. Identify all entities. For each identified entity, extract the following information:\n- entity_name: Name of the entity, capitalized\n- entity_type: One of the following types: [organization,person,geo,event]\n- entity_description: Comprehensive description of the entity's attributes and activities\nFormat each entity as (\"entity\"<|><|><|>)\n \n2. From the entities identified in step 1, identify all pairs of (source_entity, target_entity) that are clearly related to each other.\nFor each pair of related entities, extract the following information:\n- source_entity: name of the source entity, as identified in step 1\n- target_entity: name of the target entity, as identified in step 1\n- relationship_description: explanation as to why you think the source entity and the target entity are related to each other\n- relationship_strength: a numeric score indicating strength of the relationship between the source entity and target entity\n Format each relationship as (\"relationship\"<|><|><|><|>)\n \n3. Return output in English as a single list of all the entities and relationships identified in steps 1 and 2. Use ## as the list delimiter.\n \n4. When finished, output <|COMPLETE|>\n \n######################\n-Examples-\n######################\nExample 1:\nEntity_types: ORGANIZATION,PERSON\nText:\nThe Verdantis's Central Institution is scheduled to meet on Monday and Thursday, with the institution planning to release its latest policy decision on Thursday at 1:30 p.m. PDT, followed by a press conference where Central Institution Chair Martin Smith will take questions. Investors expect the Market Strategy Committee to hold its benchmark interest rate steady in a range of 3.5%-3.75%.\n######################\nOutput:\n(\"entity\"<|>CENTRAL INSTITUTION<|>ORGANIZATION<|>The Central Institution is the Federal Reserve of Verdantis, which is setting interest rates on Monday and Thursday)\n##\n(\"entity\"<|>MARTIN SMITH<|>PERSON<|>Martin Smith is the chair of the Central Institution)\n##\n(\"entity\"<|>MARKET STRATEGY COMMITTEE<|>ORGANIZATION<|>The Central Institution committee makes key decisions about interest rates and the growth of Verdantis's money supply)\n##\n(\"relationship\"<|>MARTIN SMITH<|>CENTRAL INSTITUTION<|>Martin Smith is the Chair of the Central Institution and will answer questions at a press conference<|>9)\n<|COMPLETE|>\n\n######################\nExample 2:\nEntity_types: ORGANIZATION\nText:\nTechGlobal's (TG) stock skyrocketed in its opening day on the Global Exchange Thursday. But IPO experts warn that the semiconductor corporation's debut on the public markets isn't indicative of how other newly listed companies may perform.\n\nTechGlobal, a formerly public company, was taken private by Vision Holdings in 2014. The well-established chip designer says it powers 85% of premium smartphones.\n######################\nOutput:\n(\"entity\"<|>TECHGLOBAL<|>ORGANIZATION<|>TechGlobal is a stock now listed on the Global Exchange which powers 85% of premium smartphones)\n##\n(\"entity\"<|>VISION HOLDINGS<|>ORGANIZATION<|>Vision Holdings is a firm that previously owned TechGlobal)\n##\n(\"relationship\"<|>TECHGLOBAL<|>VISION HOLDINGS<|>Vision Holdings formerly owned TechGlobal from 2014 until present<|>5)\n<|COMPLETE|>\n\n######################\nExample 3:\nEntity_types: ORGANIZATION,GEO,PERSON\nText:\nFive Aurelians jailed for 8 years in Firuzabad and widely regarded as hostages are on their way home to Aurelia.\n\nThe swap orchestrated by Quintara was finalized when $8bn of Firuzi funds were transferred to financial institutions in Krohaara, the capital of Quintara.\n\nThe exchange initiated in Firuzabad's capital, Tiruzia, led to the four men and one woman, who are also Firuzi nationals, boarding a chartered flight to Krohaara.\n\nThey were welcomed by senior Aurelian officials and are now on their way to Aurelia's capital, Cashion.\n\nThe Aurelians include 39-year-old businessman Samuel Namara, who has been held in Tiruzia's Alhamia Prison, as well as journalist Durke Bataglani, 59, and environmentalist Meggie Tazbah, 53, who also holds Bratinas nationality.\n######################\nOutput:\n(\"entity\"<|>FIRUZABAD<|>GEO<|>Firuzabad held Aurelians as hostages)\n##\n(\"entity\"<|>AURELIA<|>GEO<|>Country seeking to release hostages)\n##\n(\"entity\"<|>QUINTARA<|>GEO<|>Country that negotiated a swap of money in exchange for hostages)\n##\n##\n(\"entity\"<|>TIRUZIA<|>GEO<|>Capital of Firuzabad where the Aurelians were being held)\n##\n(\"entity\"<|>KROHAARA<|>GEO<|>Capital city in Quintara)\n##\n(\"entity\"<|>CASHION<|>GEO<|>Capital city in Aurelia)\n##\n(\"entity\"<|>SAMUEL NAMARA<|>PERSON<|>Aurelian who spent time in Tiruzia's Alhamia Prison)\n##\n(\"entity\"<|>ALHAMIA PRISON<|>GEO<|>Prison in Tiruzia)\n##\n(\"entity\"<|>DURKE BATAGLANI<|>PERSON<|>Aurelian journalist who was held hostage)\n##\n(\"entity\"<|>MEGGIE TAZBAH<|>PERSON<|>Bratinas national and environmentalist who was held hostage)\n##\n(\"relationship\"<|>FIRUZABAD<|>AURELIA<|>Firuzabad negotiated a hostage exchange with Aurelia<|>2)\n##\n(\"relationship\"<|>QUINTARA<|>AURELIA<|>Quintara brokered the hostage exchange between Firuzabad and Aurelia<|>2)\n##\n(\"relationship\"<|>QUINTARA<|>FIRUZABAD<|>Quintara brokered the hostage exchange between Firuzabad and Aurelia<|>2)\n##\n(\"relationship\"<|>SAMUEL NAMARA<|>ALHAMIA PRISON<|>Samuel Namara was a prisoner at Alhamia prison<|>8)\n##\n(\"relationship\"<|>SAMUEL NAMARA<|>MEGGIE TAZBAH<|>Samuel Namara and Meggie Tazbah were exchanged in the same hostage release<|>2)\n##\n(\"relationship\"<|>SAMUEL NAMARA<|>DURKE BATAGLANI<|>Samuel Namara and Durke Bataglani were exchanged in the same hostage release<|>2)\n##\n(\"relationship\"<|>MEGGIE TAZBAH<|>DURKE BATAGLANI<|>Meggie Tazbah and Durke Bataglani were exchanged in the same hostage release<|>2)\n##\n(\"relationship\"<|>SAMUEL NAMARA<|>FIRUZABAD<|>Samuel Namara was a hostage in Firuzabad<|>2)\n##\n(\"relationship\"<|>MEGGIE TAZBAH<|>FIRUZABAD<|>Meggie Tazbah was a hostage in Firuzabad<|>2)\n##\n(\"relationship\"<|>DURKE BATAGLANI<|>FIRUZABAD<|>Durke Bataglani was a hostage in Firuzabad<|>2)\n<|COMPLETE|>\n\n######################\n-Real Data-\n######################\nEntity_types: organization,person,geo,event\nText: The Project Gutenberg eBook of A Christmas Carol \nThis ebook is for the use of anyone anywhere in the United States and\nmost other parts of the world at no cost and with almost no restrictions\nwhatsoever. You may copy it, give it away or re-use it under the terms\nof the Project Gutenberg License included with this ebook or online\nat www.gutenberg.org. If you are not located in the United States,\nyou will have to check the laws of the country where you are located\n######################\nOutput:" } }

Additional Information

GraphRAG Version: Latest version
Operating System: Mac M3
Python Version: 3.10
Related Issues:

un-lock-me commented 1 month ago

Any idea how can I fix this please?

abdshomad commented 3 weeks ago

Experience the same error on 2 different environments.

Screenshots

Log

graphrag error : ERROR:datashaper.workflow.workflow:Error executing verb "cluster_graph" in create_base_entity_graph: Columns must be same length as key
Traceback (most recent call last):
  File "/home/demo/miniconda3/envs/graphrag/lib/python3.10/site-packages/datashaper/workflow/workflow.py", line 410, in _execute_verb
    result = node.verb.func(**verb_args)
  File "/home/demo/miniconda3/envs/graphrag/lib/python3.10/site-packages/graphrag/index/verbs/graph/clustering/cluster_graph.py", line 106, in cluster_graph
    output_df[[level_to, to]] = pd.DataFrame(
  File "/home/demo/.local/lib/python3.10/site-packages/pandas/core/frame.py", line 4299, in __setitem__
    self._setitem_array(key, value)
  File "/home/demo/.local/lib/python3.10/site-packages/pandas/core/frame.py", line 4341, in _setitem_array
    check_key_length(self.columns, key, value)
  File "/home/demo/.local/lib/python3.10/site-packages/pandas/core/indexers/utils.py", line 390, in check_key_length
    raise ValueError("Columns must be same length as key")
ValueError: Columns must be same length as key
ERROR:graphrag.index.run.run:error running workflow create_base_entity_graph
Traceback (most recent call last):
  File "/home/demo/miniconda3/envs/graphrag/lib/python3.10/site-packages/graphrag/index/run/run.py", line 227, in run_pipeline
    result = await _process_workflow(
  File "/home/demo/miniconda3/envs/graphrag/lib/python3.10/site-packages/graphrag/index/run/workflow.py", line 91, in _process_workflow
    result = await workflow.run(context, callbacks)
  File "/home/demo/miniconda3/envs/graphrag/lib/python3.10/site-packages/datashaper/workflow/workflow.py", line 369, in run
    timing = await self._execute_verb(node, context, callbacks)
  File "/home/demo/miniconda3/envs/graphrag/lib/python3.10/site-packages/datashaper/workflow/workflow.py", line 410, in _execute_verb
    result = node.verb.func(**verb_args)
  File "/home/demo/miniconda3/envs/graphrag/lib/python3.10/site-packages/graphrag/index/verbs/graph/clustering/cluster_graph.py", line 106, in cluster_graph
    output_df[[level_to, to]] = pd.DataFrame(
  File "/home/demo/.local/lib/python3.10/site-packages/pandas/core/frame.py", line 4299, in __setitem__
    self._setitem_array(key, value)
❌ create_base_entity_graph

GraphRAG Config Used

encoding_model: cl100k_base
skip_workflows: []
llm:
  api_key: ${GRAPHRAG_API_KEY}
  type: openai_chat # or azure_openai_chat
  model: gpt-4o # nemotron # gpt-4-turbo-preview
  model_supports_json: true # recommended if this is available for your model.
  # max_tokens: 4000
  # request_timeout: 180.0
  # api_base: # https://<instance>.openai.azure.com
  # api_version: 2024-02-15-preview
  # organization: <organization_id>
  # deployment_name: <azure_model_deployment_name>
  # tokens_per_minute: 150_000 # set a leaky bucket throttle
  # requests_per_minute: 10_000 # set a leaky bucket throttle
  # max_retries: 10
  # max_retry_wait: 10.0
  # sleep_on_rate_limit_recommendation: true # whether to sleep when azure suggests wait-times
  # concurrent_requests: 25 # the number of parallel inflight requests that may be made
  # temperature: 0 # temperature for sampling
  # top_p: 1 # top-p sampling
  # n: 1 # Number of completions to generate

parallelization:
  stagger: 0.3
  # num_threads: 50 # the number of threads to use for parallel processing

async_mode: threaded # or asyncio

embeddings:
  ## parallelization: override the global parallelization settings for embeddings
  async_mode: threaded # or asyncio
  # target: required # or all
  # batch_size: 16 # the number of documents to send in a single request
  # batch_max_tokens: 8191 # the maximum number of tokens to send in a single request
  llm:
    api_key: ${GRAPHRAG_API_KEY}
    type: openai_embedding # or azure_openai_embedding
    model: text-embedding-3-small # mxbai-embed-large # text-embedding-3-small
    # api_base: # https://<instance>.openai.azure.com
    # api_version: 2024-02-15-preview
    # organization: <organization_id>
    # deployment_name: <azure_model_deployment_name>
    # tokens_per_minute: 150_000 # set a leaky bucket throttle
    # requests_per_minute: 10_000 # set a leaky bucket throttle
    # max_retries: 10
    # max_retry_wait: 10.0
    # sleep_on_rate_limit_recommendation: true # whether to sleep when azure suggests wait-times
    # concurrent_requests: 25 # the number of parallel inflight requests that may be made

chunks:
  size: 1200
  overlap: 100
  group_by_columns: [id] # by default, we don't allow chunks to cross documents

input:
  type: file # or blob
  file_type: text # or csv
  base_dir: "input"
  file_encoding: utf-8
  file_pattern: ".*\\.txt$"

cache:
  type: file # or blob
  base_dir: "cache"
  # connection_string: <azure_blob_storage_connection_string>
  # container_name: <azure_blob_storage_container_name>

storage:
  type: file # or blob
  base_dir: "output"
  # connection_string: <azure_blob_storage_connection_string>
  # container_name: <azure_blob_storage_container_name>

reporting:
  type: file # or console, blob
  base_dir: "output"
  # connection_string: <azure_blob_storage_connection_string>
  # container_name: <azure_blob_storage_container_name>

entity_extraction:
  ## strategy: fully override the entity extraction strategy.
  ##   type: one of graph_intelligence, graph_intelligence_json and nltk
  ## llm: override the global llm settings for this task
  ## parallelization: override the global parallelization settings for this task
  ## async_mode: override the global async_mode settings for this task
  prompt: "prompts/entity_extraction.txt"
  entity_types: [organization,person,geo,event]
  max_gleanings: 1

summarize_descriptions:
  ## llm: override the global llm settings for this task
  ## parallelization: override the global parallelization settings for this task
  ## async_mode: override the global async_mode settings for this task
  prompt: "prompts/summarize_descriptions.txt"
  max_length: 500

claim_extraction:
  ## llm: override the global llm settings for this task
  ## parallelization: override the global parallelization settings for this task
  ## async_mode: override the global async_mode settings for this task
  # enabled: true
  prompt: "prompts/claim_extraction.txt"
  description: "Any claims or facts that could be relevant to information discovery."
  max_gleanings: 1

community_reports:
  ## llm: override the global llm settings for this task
  ## parallelization: override the global parallelization settings for this task
  ## async_mode: override the global async_mode settings for this task
  prompt: "prompts/community_report.txt"
  max_length: 2000
  max_input_length: 8000

cluster_graph:
  max_cluster_size: 10

embed_graph:
  enabled: false # if true, will generate node2vec embeddings for nodes
  # num_walks: 10
  # walk_length: 40
  # window_size: 2
  # iterations: 3
  # random_seed: 597832

umap:
  enabled: false # if true, will generate UMAP embeddings for nodes

snapshots:
  graphml: false
  raw_entities: false
  top_level_nodes: false

local_search:
  # text_unit_prop: 0.5
  # community_prop: 0.1
  # conversation_history_max_turns: 5
  # top_k_mapped_entities: 10
  # top_k_relationships: 10
  # llm_temperature: 0 # temperature for sampling
  # llm_top_p: 1 # top-p sampling
  # llm_n: 1 # Number of completions to generate
  # max_tokens: 12000

global_search:
  # llm_temperature: 0 # temperature for sampling
  # llm_top_p: 1 # top-p sampling
  # llm_n: 1 # Number of completions to generate
  # max_tokens: 12000
  # data_max_tokens: 12000
  # map_max_tokens: 1000
  # reduce_max_tokens: 2000
  # concurrency: 32

Any idea how can I fix this please?

microsoft / graphrag