microsoft / graphrag

A modular graph-based Retrieval-Augmented Generation (RAG) system
https://microsoft.github.io/graphrag/
MIT License
13.37k stars 1.14k forks source link

ERROR Error executing verb "cluster_graph" in create_base_entity_graph: Columns must be same length as key #426

Closed mbt1909432 closed 1 week ago

mbt1909432 commented 3 weeks ago

Below is the document I test: "In the quaint village of Aeloria, a young girl named Elara discovered an ancient, enchanted locket in her grandmother's attic. When she opened it, she found a hidden map leading to a forgotten grove. Guided by curiosity and courage, Elara followed the map and uncovered a magical tree that granted one wish. With a heart full of hope, she wished for her village to prosper. The next morning, Aeloria was transformed, blooming with life and happiness. Elara's selfless wish had brought endless joy to her village, and she cherished the secret of the magical tree forever.magical places are those we discover when we dare to follow our curiosity and believe in the power of our dreams."

my cache file content is: "{"result": "## Entity Recognition and Relationship Extraction Results\n\nEntity_types: person, location, organization, event, concept\n\nEntities:\n\n Elara: young girl who discovers the enchanted locket\n Aeloria: quaint village where the story takes place\n Ancient tree: magical tree that grants wishes\n Enchanted locket: object that reveals a hidden map\n Hope: concept representing the power of dreams\n\nRelationships:\n\n Elara discovers the enchanted locket in her grandmother's attic. (Person-Object)\n Elara follows the map and uncovers the magical tree. (Person-Event)\n The magical tree grants Elara one wish. (Object-Event)\n Elara wishes for her village to prosper. (Event-Concept)\n Aeloria is transformed with life and happiness. (Event-Concept)\n\nInterpretation:\n\nThe story revolves around Elara's discovery of an enchanted locket and her subsequent wish for the prosperity of her village. The magical tree symbolizes the power of hope and dreams, while the transformation of Aeloria highlights the positive impact of Elara's selfless wish.\n\nPossible Queries:\n\n What is the significance of the enchanted locket in the story?\n How did Elara's wish affect the village of Aeloria?\n What role does hope play in the story's resolution?", "input": "\n-Goal-\nGiven a text document that is potentially relevant to this activity and a list of entity types, identify all entities of those types from the text and all relationships among the identified entities.\n\n-Steps-\n1. Identify all entities. For each identified entity, extract the following information:\n- entity_name: Name of the entity, capitalized\n- entity_type: One of the following types: [organization,person,geo,event]\n- entity_description: Comprehensive description of the entity's attributes and activities\nFormat each entity as (\"entity\"<|><|><|>\n\n2. From the entities identified in step 1, identify all pairs of (source_entity, target_entity) that are clearly related* to each other.\nFor each pair of related entities, extract the following information:\n- source_entity: name of the source entity, as identified in step 1\n- target_entity: name of the target entity, as identified in step 1\n- relationship_description: explanation as to why you think the source entity and the target entity are related to each other\n- relationship_strength: a numeric score indicating strength of the relationship between the source entity and target entity\n Format each relationship as (\"relationship\"<|><|><|><|>)\n\n3. Return output in English as a single list of all the entities and relationships identified in steps 1 and 2. Use ## as the list delimiter.\n\n4. When finished, output <|COMPLETE|>\n\n######################\n-Examples-\n######################\nExample 1:\n\nEntity_types: [person, technology, mission, organization, location]\nText:\nwhile Alex clenched his jaw, the buzz of frustration dull against the backdrop of Taylor's authoritarian certainty. It was this competitive undercurrent that kept him alert, the sense that his and Jordan's shared commitment to discovery was an unspoken rebellion against Cruz's narrowing vision of control and order.\n\nThen Taylor did something unexpected. They paused beside Jordan and, for a moment, observed the device with something akin to reverence. \u201cIf this tech can be understood...\" Taylor said, their voice quieter, \"It could change the game for us. For all of us.\u201d\n\nThe underlying dismissal earlier seemed to falter, replaced by a glimpse of reluctant respect for the gravity of what lay in their hands. Jordan looked up, and for a fleeting heartbeat, their eyes locked with Taylor's, a wordless clash of wills softening into an uneasy truce.\n\nIt was a small transformation, barely perceptible, but one that Alex noted with an inward nod. They had all been brought here by different paths\n################\nOutput:\n(\"entity\"<|>\"Alex\"<|>\"person\"<|>\"Alex is a character who experiences frustration and is observant of the dynamics among other characters.\")##\n(\"entity\"<|>\"Taylor\"<|>\"person\"<|>\"Taylor is portrayed with authoritarian certainty and shows a moment of reverence towards a device, indicating a change in perspective.\")##\n(\"entity\"<|>\"Jordan\"<|>\"person\"<|>\"Jordan shares a commitment to discovery and has a significant interaction with Taylor regarding a device.\")##\n(\"entity\"<|>\"Cruz\"<|>\"person\"<|>\"Cruz is associated with a vision of control and order, influencing the dynamics among other characters.\")##\n(\"entity\"<|>\"The Device\"<|>\"technology\"<|>\"The Device is central to the story, with potential game-changing implications, and is revered by Taylor.\")##\n(\"relationship\"<|>\"Alex\"<|>\"Taylor\"<|>\"Alex is affected by Taylor's authoritarian certainty and observes changes in Taylor's attitude towards the device.\"<|>7)##\n(\"relationship\"<|>\"Alex\"<|>\"Jordan\"<|>\"Alex and Jordan share a commitment to discovery, which contrasts with Cruz's vision.\"<|>6)##\n(\"relationship\"<|>\"Taylor\"<|>\"Jordan\"<|>\"Taylor and Jordan interact directly regarding the device, leading to a moment of mutual respect and an uneasy truce.\"<|>8)##\n(\"relationship\"<|>\"Jordan\"<|>\"Cruz\"<|>\"Jordan's commitment to discovery is in rebellion against Cruz's vision of control and order.\"<|>5)##\n(\"relationship\"<|>\"Taylor\"<|>\"The Device\"<|>\"Taylor shows reverence towards the device, indicating its importance and potential impact.\"<|>9)<|COMPLETE|>\n#############################\nExample 2:\n\nEntity_types: [person, technology, mission, organization, location]\nText:\nThey were no longer mere operatives; they had become guardians of a threshold, keepers of a message from a realm beyond stars and stripes. This elevation in their mission could not be shackled by regulations and established protocols\u2014it demanded a new perspective, a new resolve.\n\nTension threaded through the dialogue of beeps and static as communications with Washington buzzed in the background. The team stood, a portentous air enveloping them. It was clear that the decisions they made in the ensuing hours could redefine humanity's place in the cosmos or condemn them to ignorance and potential peril.\n\nTheir connection to the stars solidified, the group moved to address the crystallizing warning, shifting from passive recipients to active participants. Mercer's latter instincts gained precedence\u2014 the team's mandate had evolved, no longer solely to observe and report but to interact and prepare. A metamorphosis had begun, and Operation: Dulce hummed with the newfound frequency of their daring, a tone set not by the earthly\n#############\nOutput:\n(\"entity\"<|>\"Washington\"<|>\"location\"<|>\"Washington is a location where communications are being received, indicating its importance in the decision-making process.\")##\n(\"entity\"<|>\"Operation: Dulce\"<|>\"mission\"<|>\"Operation: Dulce is described as a mission that has evolved to interact and prepare, indicating a significant shift in objectives and activities.\")##\n(\"entity\"<|>\"The team\"<|>\"organization\"<|>\"The team is portrayed as a group of individuals who have transitioned from passive observers to active participants in a mission, showing a dynamic change in their role.\")##\n(\"relationship\"<|>\"The team\"<|>\"Washington\"<|>\"The team receives communications from Washington, which influences their decision-making process.\"<|>7)##\n(\"relationship\"<|>\"The team\"<|>\"Operation: Dulce\"<|>\"The team is directly involved in Operation: Dulce, executing its evolved objectives and activities.\"<|>9)<|COMPLETE|>\n#############################\nExample 3:\n\nEntity_types: [person, role, technology, organization, event, location, concept]\nText:\ntheir voice slicing through the buzz of activity. \"Control may be an illusion when facing an intelligence that literally writes its own rules,\" they stated stoically, casting a watchful eye over the flurry of data.\n\n\"It's like it's learning to communicate,\" offered Sam Rivera from a nearby interface, their youthful energy boding a mix of awe and anxiety. \"This gives talking to strangers' a whole new meaning.\"\n\nAlex surveyed his team\u2014each face a study in concentration, determination, and not a small measure of trepidation. \"This might well be our first contact,\" he acknowledged, \"And we need to be ready for whatever answers back.\"\n\nTogether, they stood on the edge of the unknown, forging humanity's response to a message from the heavens. The ensuing silence was palpable\u2014a collective introspection about their role in this grand cosmic play, one that could rewrite human history.\n\nThe encrypted dialogue continued to unfold, its intricate patterns showing an almost uncanny anticipation\n#############\nOutput:\n(\"entity\"<|>\"Sam Rivera\"<|>\"person\"<|>\"Sam Rivera is a member of a team working on communicating with an unknown intelligence, showing a mix of awe and anxiety.\")##\n(\"entity\"<|>\"Alex\"<|>\"person\"<|>\"Alex is the leader of a team attempting first contact with an unknown intelligence, acknowledging the significance of their task.\")##\n(\"entity\"<|>\"Control\"<|>\"concept\"<|>\"Control refers to the ability to manage or govern, which is challenged by an intelligence that writes its own rules.\")##\n(\"entity\"<|>\"Intelligence\"<|>\"concept\"<|>\"Intelligence here refers to an unknown entity capable of writing its own rules and learning to communicate.\")##\n(\"entity\"<|>\"First Contact\"<|>\"event\"<|>\"First Contact is the potential initial communication between humanity and an unknown intelligence.\")##\n(\"entity\"<|>\"Humanity's Response\"<|>\"event\"<|>\"Humanity's Response is the collective action taken by Alex's team in response to a message from an unknown intelligence.\")##\n(\"relationship\"<|>\"Sam Rivera\"<|>\"Intelligence\"<|>\"Sam Rivera is directly involved in the process of learning to communicate with the unknown intelligence.\"<|>9)##\n(\"relationship\"<|>\"Alex\"<|>\"First Contact\"<|>\"Alex leads the team that might be making the First Contact with the unknown intelligence.\"<|>10)##\n(\"relationship\"<|>\"Alex\"<|>\"Humanity's Response\"<|>\"Alex and his team are the key figures in Humanity's Response to the unknown intelligence.\"<|>8)##\n(\"relationship\"<|>\"Control\"<|>\"Intelligence\"<|>\"The concept of Control is challenged by the Intelligence that writes its own rules.\"<|>7)<|COMPLETE|>\n#############################\n-Real Data-\n######################\nEntity_types: organization,person,geo,event\nText: In the quaint village of Aeloria, a young girl named Elara discovered an ancient, enchanted locket in her grandmother's attic. When she opened it, she found a hidden map leading to a forgotten grove. Guided by curiosity and courage, Elara followed the map and uncovered a magical tree that granted one wish. With a heart full of hope, she wished for her village to prosper. The next morning, Aeloria was transformed, blooming with life and happiness. Elara's selfless wish had brought endless joy to her village, and she cherished the secret of the magical tree forever.magical places are those we discover when we dare to follow our curiosity and believe in the power of our dreams.\n######################\nOutput:", "parameters": {"model": "gemma:7b", "temperature": 0.0, "frequency_penalty": 0.0, "presence_penalty": 0.0, "top_p": 1.0, "max_tokens": 4000, "n": null}}"

❌ create_base_entity_graph None ⠧ GraphRAG Indexer ├── Loading Input (text) - 1 files loaded (0 filtered) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:00 0:00:00 ├── create_base_text_units ├── create_base_extracted_entities ├── create_summarized_entities └── create_base_entity_graph ❌ Errors occurred during the pipeline run, see logs for more details.

BUG:16:31:37,751 datashaper.workflow.workflow ERROR Error executing verb "cluster_graph" in create_base_entity_graph: Columns must be same length as key

Not sure why this happen

scy-flower commented 3 weeks ago

i same

ArneJanning commented 3 weeks ago

Same problem here.

AlonsoGuevara commented 2 weeks ago

Hi @mbt1909432 From the output provided I can see the LMM is answering in a different format. Can you please provide information about the configuration you're using?

Working with models different than gpt-4 family may required a bit of prompt tuning, since some other models tend to be more verbose and we are actually aiming for structured responses.

mbt1909432 commented 2 weeks ago

Hi @mbt1909432 From the output provided I can see the LMM is answering in a different format. Can you please provide information about the configuration you're using?

Working with models different than gpt-4 family may required a bit of prompt tuning, since some other models tend to be more verbose and we are actually aiming for structured responses.

The LLM api backend I choose was ollama, the model that caused the result I mentioned above was gemma:7b. However when I used gemma2, the workflow can works normally

hp0404 commented 2 weeks ago

I tried using gemma2. It did help resolve the initial issue, but I encountered another one:

{"type": "error", "data": "Error executing verb \"text_embed\" in create_final_entities: 404 page not found...", "stack": "...from None\nopenai.NotFoundError: 404 page not found\n", "source": "404 page not found", "details": null}

I'm not using OpenAI, my embedding settings:

embeddings:
  ## parallelization: override the global parallelization settings for embeddings
  async_mode: threaded # or asyncio
  llm:
    api_key: ${GRAPHRAG_API_KEY}
    type: openai_embedding # or azure_openai_embedding
    model: nomic_embed_text
    api_base: http://localhost:11434/api/embeddings
mbt1909432 commented 2 weeks ago

The embedding api setting from ollama is not the OpenAI format. Use lm studio instead

发自我的iPhone

------------------ Original ------------------ From: hp0404 @.> Date: Thu,Jul 11,2024 11:18 PM To: microsoft/graphrag @.> Cc: Xu Yilong @.>, Mention @.> Subject: Re: [microsoft/graphrag] ERROR Error executing verb "cluster_graph"in create_base_entity_graph: Columns must be same length as key (Issue #426)

I tried using gemma2. It did help resolve the initial issue, but I encountered another one: {"type": "error", "data": "Error executing verb \"text_embed\" in create_final_entities: 404 page not found...", "stack": "...from None\nopenai.NotFoundError: 404 page not found\n", "source": "404 page not found", "details": null}

I'm not using OpenAI, my embedding settings: embeddings: ## parallelization: override the global parallelization settings for embeddings async_mode: threaded # or asyncio llm: api_key: ${GRAPHRAG_API_KEY} type: openai_embedding # or azure_openai_embedding model: nomic_embed_text api_base: http://localhost:11434/api/embeddings

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>

hongyispace commented 2 weeks ago

I tried using gemma2. It did help resolve the initial issue, but I encountered another one:

{"type": "error", "data": "Error executing verb \"text_embed\" in create_final_entities: 404 page not found...", "stack": "...from None\nopenai.NotFoundError: 404 page not found\n", "source": "404 page not found", "details": null}

I'm not using OpenAI, my embedding settings:

embeddings:
  ## parallelization: override the global parallelization settings for embeddings
  async_mode: threaded # or asyncio
  llm:
    api_key: ${GRAPHRAG_API_KEY}
    type: openai_embedding # or azure_openai_embedding
    model: nomic_embed_text
    api_base: http://localhost:11434/api/embeddings

api_base: http://localhost:11434/api

mbt1909432 commented 2 weeks ago

I mean the api setting of embedding model from the ollama is not the OpenAI format. This is a standard. I know you use the embedding api from ollama,sadly,It does't support. Use another backend instead,like lm studio

发自我的iPhone

------------------ Original ------------------ From: T @.> Date: Thu,Jul 11,2024 11:20 PM To: reply+AQVVQALWW734EGEO3Q6UWOGETPMN5EVBNHHI5QG3RY @.> Subject: Re: [microsoft/graphrag] ERROR Error executing verb "cluster_graph"in create_base_entity_graph: Columns must be same length as key (Issue #426)

The embedding api setting from ollama is not the OpenAI format. Use lm studio instead

发自我的iPhone

------------------ Original ------------------ From: hp0404 @.> Date: Thu,Jul 11,2024 11:18 PM To: microsoft/graphrag @.> Cc: Xu Yilong @.>, Mention @.> Subject: Re: [microsoft/graphrag] ERROR Error executing verb "cluster_graph"in create_base_entity_graph: Columns must be same length as key (Issue #426)

I tried using gemma2. It did help resolve the initial issue, but I encountered another one: {"type": "error", "data": "Error executing verb \"text_embed\" in create_final_entities: 404 page not found...", "stack": "...from None\nopenai.NotFoundError: 404 page not found\n", "source": "404 page not found", "details": null}

I'm not using OpenAI, my embedding settings: embeddings: ## parallelization: override the global parallelization settings for embeddings async_mode: threaded # or asyncio llm: api_key: ${GRAPHRAG_API_KEY} type: openai_embedding # or azure_openai_embedding model: nomic_embed_text api_base: http://localhost:11434/api/embeddings

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>

hp0404 commented 2 weeks ago

api_base: http://localhost:11434/api

{"type": "error", "data": "...Error executing verb \"text_embed\" in create_final_entities: Error code: 404 - {'error': \"model 'nomic_embed_text' not found, try pulling it first\"}",

But you're correct; it doesn't work either way, but at least now I understand why. Thanks!

hongyispace commented 2 weeks ago

api_base: http://localhost:11434/api

{"type": "error", "data": "...Error executing verb \"text_embed\" in create_final_entities: Error code: 404 - {'error': \"model 'nomic_embed_text' not found, try pulling it first\"}",

But you're correct; it doesn't work either way, but at least now I understand why. Thanks!

try update the file /opt/miniconda3/envs/graphrag/lib/python3.12/site-packages/graphrag/llm/openai/openai_embeddings_llm.py as follows

Copyright (c) 2024 Microsoft Corporation.

Licensed under the MIT License

"""The EmbeddingsLLM class."""

from typing_extensions import Unpack

from graphrag.llm.base import BaseLLM from graphrag.llm.types import ( EmbeddingInput, EmbeddingOutput, LLMInput, )

from .openai_configuration import OpenAIConfiguration from .types import OpenAIClientTypes

import ollama

class OpenAIEmbeddingsLLM(BaseLLM[EmbeddingInput, EmbeddingOutput]): """A text-embedding generator LLM."""

_client: OpenAIClientTypes
_configuration: OpenAIConfiguration

def __init__(self, client: OpenAIClientTypes, configuration: OpenAIConfiguration):
    self.client = client
    self.configuration = configuration

async def _execute_llm(
    self, input: EmbeddingInput, **kwargs: Unpack[LLMInput]
) -> EmbeddingOutput | None:
    args = {
        "model": self.configuration.model,
        **(kwargs.get("model_parameters") or {}),
    }
    #embedding = await self.client.embeddings.create(
    #    input=input,
    #    **args,
    #)
    embedding_list = []
    for inp in input:
        embedding = ollama.embeddings(model="nomic-embed-text", prompt=inp)
        embedding_list.append(embedding["embedding"])
    #return [d.embedding for d in embedding.data]
    return embedding_list
AlonsoGuevara commented 1 week ago

Hi! We are consolidating alternate model issues here: https://github.com/microsoft/graphrag/issues/657