run-llama / llama_index

LlamaIndex is a data framework for your LLM applications
https://docs.llamaindex.ai
MIT License
35.41k stars 4.99k forks source link

[Question]: How to update the default prompt of KnowledgeGraphIndex #15760

Open DataNoob0723 opened 2 weeks ago

DataNoob0723 commented 2 weeks ago

Question Validation

Question

I want to figure out how to update the default prompt of KnowledgeGraphIndex, so that I can set constraints on the entities and relations it extracts.

dosubot[bot] commented 2 weeks ago

To update the default prompt of KnowledgeGraphIndex to set constraints on the entities and relations it extracts, you can modify the DEFAULT_KG_TRIPLET_EXTRACT_PROMPT template. This template allows you to specify the allowed entity types and relation types.

Here is an example of how you can update the DEFAULT_KG_TRIPLET_EXTRACT_PROMPT to set constraints on the entities and relations:

from llama_index.core.prompts.base import PromptTemplate
from llama_index.core.prompts.prompt_type import PromptType
from llama_index.core.prompts.default_prompts import DEFAULT_KG_TRIPLET_EXTRACT_PROMPT

CUSTOM_KG_TRIPLET_EXTRACT_TMPL = (
    "Extract up to {max_knowledge_triplets} knowledge triplets from the given text. "
    "Each triplet should be in the form of (head, relation, tail) with their respective types.\n"
    "---------------------\n"
    "INITIAL ONTOLOGY:\n"
    "Entity Types: {allowed_entity_types}\n"
    "Relation Types: {allowed_relation_types}\n"
    "\n"
    "Use these types as a starting point, but introduce new types if necessary based on the context.\n"
    "\n"
    "GUIDELINES:\n"
    "- Output in JSON format: [{{'head': '', 'head_type': '', 'relation': '', 'tail': '', 'tail_type': ''}}]\n"
    "- Use the most complete form for entities (e.g., 'United States of America' instead of 'USA')\n"
    "- Keep entities concise (3-5 words max)\n"
    "- Break down complex phrases into multiple triplets\n"
    "- Ensure the knowledge graph is coherent and easily understandable\n"
    "---------------------\n"
    "EXAMPLE:\n"
    "Text: Tim Cook, CEO of Apple Inc., announced the new Apple Watch that monitors heart health. "
    "UC Berkeley researchers studied the benefits of apples.\n"
    "Output:\n"
    "[{{'head': 'Tim Cook', 'head_type': 'PERSON', 'relation': 'CEO_OF', 'tail': 'Apple Inc.', 'tail_type': 'ORGANIZATION'}},\n"
    " {{'head': 'Apple Inc.', 'head_type': 'ORGANIZATION', 'relation': 'PRODUCES', 'tail': 'Apple Watch', 'tail_type': 'PRODUCT'}},\n"
    " {{'head': 'Apple Watch', 'head_type': 'PRODUCT', 'relation': 'MONITORS', 'tail': 'heart health', 'tail_type': 'HEALTH_METRIC'}},\n"
    " {{'head': 'UC Berkeley', 'head_type': 'ORGANIZATION', 'relation': 'STUDIES', 'tail': 'benefits of apples', 'tail_type': 'RESEARCH_TOPIC'}}]\n"
    "---------------------\n"
    "Text: {text}\n"
    "Output:\n"
)

CUSTOM_KG_TRIPLET_EXTRACT_PROMPT = PromptTemplate(
    CUSTOM_KG_TRIPLET_EXTRACT_TMPL, prompt_type=PromptType.KNOWLEDGE_TRIPLET_EXTRACT
)

In this custom template, you can specify the allowed_entity_types and allowed_relation_types to set constraints on the entities and relations extracted. You can then use this custom prompt template when initializing the KnowledgeGraphIndex:

from llama_index.core.indices.knowledge_graph.base import KnowledgeGraphIndex

kg_index = KnowledgeGraphIndex(
    kg_triplet_extract_template=CUSTOM_KG_TRIPLET_EXTRACT_PROMPT,
    max_triplets_per_chunk=10,
    include_embeddings=False,
    show_progress=True
)

This will ensure that the KnowledgeGraphIndex uses your custom prompt template with the specified constraints on entity and relation types. Additionally, you can leverage the SchemaLLMPathExtractor class to enforce strict validation of entities and relations based on a predefined schema:

from llama_index.core.llms.llm import LLM
from llama_index.core.indices.property_graph.transformations.schema_llm import SchemaLLMPathExtractor

llm = LLM()  # Initialize your language model here

schema_extractor = SchemaLLMPathExtractor(
    llm=llm,
    extract_prompt=CUSTOM_KG_TRIPLET_EXTRACT_PROMPT,
    possible_entities=["PERSON", "ORGANIZATION", "PRODUCT", "HEALTH_METRIC", "RESEARCH_TOPIC"],
    possible_relations=["CEO_OF", "PRODUCES", "MONITORS", "STUDIES"],
    strict=True,
    max_triplets_per_chunk=10
)

# Use schema_extractor to process your data

This setup ensures that the KnowledgeGraphIndex uses a custom prompt template with specified constraints and validates the extracted entities and relations against a predefined schema [1][2][3].

To continue talking to Dosu, mention @dosu.

DataNoob0723 commented 2 weeks ago

How can I pass allowed_entity_types and allowed_relation_types to the template.

dosubot[bot] commented 2 weeks ago

To pass allowed_entity_types and allowed_relation_types to the template, you can use the SchemaLLMPathExtractor class to specify an exact schema containing possible entity types and relation types. This approach ensures that the entities and relations extracted follow a predefined schema. Here is an example of how you can do this:

from typing import Literal
from llama_index.llms.ollama import Ollama
from llama_index.core.indices.property_graph import SchemaLLMPathExtractor

# Define possible entity types and relation types
entities = Literal["PERSON", "PLACE", "ORGANIZATION"]
relations = Literal["HAS", "PART_OF", "WORKED_ON", "WORKED_WITH", "WORKED_AT"]

# Define the validation schema for which entities can have which relations
validation_schema = [
    ("ORGANIZATION", "HAS", "PERSON"),
    ("PERSON", "WORKED_AT", "ORGANIZATION"),
    ("PERSON", "WORKED_WITH", "PERSON"),
    ("PERSON", "WORKED_ON", "ORGANIZATION"),
    ("PERSON", "PART_OF", "ORGANIZATION"),
    ("ORGANIZATION", "PART_OF", "ORGANIZATION"),
    ("PERSON", "WORKED_AT", "PLACE"),
]

# Create the SchemaLLMPathExtractor with the defined schema
kg_extractor = SchemaLLMPathExtractor(
    llm=Ollama(model="llama3", json_mode=True, request_timeout=3600),
    possible_entities=entities,
    possible_relations=relations,
    kg_validation_schema=validation_schema,
    strict=True,  # Ensures only values within the schema are allowed
)

In this example, allowed_entity_types and allowed_relation_types are defined using the Literal type, and the SchemaLLMPathExtractor is used to enforce these types when extracting knowledge triplets. This method ensures that the entities and relations adhere to the specified schema [1][2].

To continue talking to Dosu, mention @dosu.

DataNoob0723 commented 2 weeks ago

I mean how can I pass allowed_entity_types and allowed_relation_types to CUSTOM_KG_TRIPLET_EXTRACT_TMPL?

dosubot[bot] commented 2 weeks ago

To pass allowed_entity_types and allowed_relation_types to CUSTOM_KG_TRIPLET_EXTRACT_TMPL, you can use the DEFAULT_DYNAMIC_EXTRACT_TMPL as a reference. Here is an example of how you can define CUSTOM_KG_TRIPLET_EXTRACT_TMPL:

CUSTOM_KG_TRIPLET_EXTRACT_TMPL = (
    "Extract up to {max_knowledge_triplets} knowledge triplets from the given text. "
    "Each triplet should be in the form of (head, relation, tail) with their respective types.\n"
    "---------------------\n"
    "INITIAL ONTOLOGY:\n"
    "Entity Types: {allowed_entity_types}\n"
    "Relation Types: {allowed_relation_types}\n"
    "\n"
    "Use these types as a starting point, but introduce new types if necessary based on the context.\n"
    "\n"
    "GUIDELINES:\n"
    "- Output in JSON format: [{{'head': '', 'head_type': '', 'relation': '', 'tail': '', 'tail_type': ''}}]\n"
    "- Use the most complete form for entities (e.g., 'United States of America' instead of 'USA')\n"
    "- Keep entities concise (3-5 words max)\n"
    "- Break down complex phrases into multiple triplets\n"
    "- Ensure the knowledge graph is coherent and easily understandable\n"
    "---------------------\n"
    "EXAMPLE:\n"
    "Text: Tim Cook, CEO of Apple Inc., announced the new Apple Watch that monitors heart health. "
    "UC Berkeley researchers studied the benefits of apples.\n"
    "Output:\n"
    "[{{'head': 'Tim Cook', 'head_type': 'PERSON', 'relation': 'CEO_OF', 'tail': 'Apple Inc.', 'tail_type': 'COMPANY'}},\n"
    " {{'head': 'Apple Inc.', 'head_type': 'COMPANY', 'relation': 'PRODUCES', 'tail': 'Apple Watch', 'tail_type': 'PRODUCT'}},\n"
    " {{'head': 'Apple Watch', 'head_type': 'PRODUCT', 'relation': 'MONITORS', 'tail': 'heart health', 'tail_type': 'HEALTH_METRIC'}},\n"
    " {{'head': 'UC Berkeley', 'head_type': 'UNIVERSITY', 'relation': 'STUDIES', 'tail': 'benefits of apples', 'tail_type': 'RESEARCH_TOPIC'}}]\n"
    "---------------------\n"
    "Text: {text}\n"
    "Output:\n"
)

You can then create a PromptTemplate using this template:

from llama_index.core.prompts.base import PromptTemplate
from llama_index.core.prompts.prompt_type import PromptType

CUSTOM_KG_TRIPLET_EXTRACT_PROMPT = PromptTemplate(
    CUSTOM_KG_TRIPLET_EXTRACT_TMPL, 
    prompt_type=PromptType.KNOWLEDGE_TRIPLET_EXTRACT
)

This template includes placeholders for allowed_entity_types and allowed_relation_types which you can pass when using the template. The DynamicLLMPathExtractor class from the LlamaIndex framework can be used to handle the extraction process, and it allows for specifying initial ontologies and expanding them dynamically. Here is an example of how you might set up the extractor:

from llama_index.core.indices.property_graph.transformations.dynamic_llm import DynamicLLMPathExtractor
from llama_index.core.llms.llm import LLM

llm = LLM()  # Initialize your LLM instance
extractor = DynamicLLMPathExtractor(
    llm=llm,
    extract_prompt=CUSTOM_KG_TRIPLET_EXTRACT_PROMPT,
    allowed_entity_types=['PERSON', 'COMPANY', 'PRODUCT', 'HEALTH_METRIC', 'UNIVERSITY', 'RESEARCH_TOPIC'],
    allowed_relation_types=['CEO_OF', 'PRODUCES', 'MONITORS', 'STUDIES']
)

This setup ensures that the allowed_entity_types and allowed_relation_types are correctly passed and utilized during the knowledge triplet extraction process [1][2].

To continue talking to Dosu, mention @dosu.

DataNoob0723 commented 2 weeks ago

After creating CUSTOM_KG_TRIPLET_EXTRACT_PROMPT, how can I create KnowledgeGraphIndex using it and pass allowed_entity_types and allowed_relation_types?

dosubot[bot] commented 2 weeks ago

To create a KnowledgeGraphIndex using CUSTOM_KG_TRIPLET_EXTRACT_PROMPT and pass allowed_entity_types and allowed_relation_types, you can follow these steps:

  1. Define your custom prompt template.
  2. Create the KnowledgeGraphIndex with the custom prompt and the allowed entity and relation types.

Here is an example:

from llama_index import KnowledgeGraphIndex
from llama_index.core.prompts import PromptTemplate
from llama_index.core.prompts.prompt_type import PromptType

# Define your custom prompt template
CUSTOM_KG_TRIPLET_EXTRACT_TMPL = (
    "Extract up to {max_knowledge_triplets} knowledge triplets from the given text. "
    "Each triplet should be in the form of (head, relation, tail) with their respective types.\n"
    "---------------------\n"
    "INITIAL ONTOLOGY:\n"
    "Entity Types: {allowed_entity_types}\n"
    "Relation Types: {allowed_relation_types}\n"
    "\n"
    "Use these types as a starting point, but introduce new types if necessary based on the context.\n"
    "\n"
    "GUIDELINES:\n"
    "- Output in JSON format: [{{'head': '', 'head_type': '', 'relation': '', 'tail': '', 'tail_type': ''}}]\n"
    "- Use the most complete form for entities (e.g., 'United States of America' instead of 'USA')\n"
    "- Keep entities concise (3-5 words max)\n"
    "- Break down complex phrases into multiple triplets\n"
    "- Ensure the knowledge graph is coherent and easily understandable\n"
    "---------------------\n"
    "EXAMPLE:\n"
    "Text: Tim Cook, CEO of Apple Inc., announced the new Apple Watch that monitors heart health. "
    "UC Berkeley researchers studied the benefits of apples.\n"
    "Output:\n"
    "[{{'head': 'Tim Cook', 'head_type': 'PERSON', 'relation': 'CEO_OF', 'tail': 'Apple Inc.', 'tail_type': 'COMPANY'}},\n"
    " {{'head': 'Apple Inc.', 'head_type': 'COMPANY', 'relation': 'PRODUCES', 'tail': 'Apple Watch', 'tail_type': 'PRODUCT'}},\n"
    " {{'head': 'Apple Watch', 'head_type': 'PRODUCT', 'relation': 'MONITORS', 'tail': 'heart health', 'tail_type': 'HEALTH_METRIC'}},\n"
    " {{'head': 'UC Berkeley', 'head_type': 'UNIVERSITY', 'relation': 'STUDIES', 'tail': 'benefits of apples', 'tail_type': 'RESEARCH_TOPIC'}}]\n"
    "---------------------\n"
    "Text: {text}\n"
    "Output:\n"
)

CUSTOM_KG_TRIPLET_EXTRACT_PROMPT = PromptTemplate(
    CUSTOM_KG_TRIPLET_EXTRACT_TMPL, 
    prompt_type=PromptType.KNOWLEDGE_TRIPLET_EXTRACT
)

# Define allowed entity types and relation types
allowed_entity_types = ["PERSON", "COMPANY", "PRODUCT", "UNIVERSITY", "HEALTH_METRIC", "RESEARCH_TOPIC"]
allowed_relation_types = ["CEO_OF", "PRODUCES", "MONITORS", "STUDIES"]

# Create the KnowledgeGraphIndex
kg_index = KnowledgeGraphIndex(
    kg_triple_extract_template=CUSTOM_KG_TRIPLET_EXTRACT_PROMPT,
    allowed_entity_types=allowed_entity_types,
    allowed_relation_types=allowed_relation_types
)

This code sets up the KnowledgeGraphIndex with the custom triplet extraction prompt and the specified allowed entity and relation types [1].

To continue talking to Dosu, mention @dosu.

DataNoob0723 commented 2 weeks ago

How can I pass my documents to create this kg_index

dosubot[bot] commented 2 weeks ago

To pass your documents to create the kg_index, you can use the from_documents method provided by the KnowledgeGraphIndex class. Here is an example of how you can do this:

from llama_index import KnowledgeGraphIndex
from llama_index.core.prompts import PromptTemplate
from llama_index.core.prompts.prompt_type import PromptType

# Define your custom prompt template
CUSTOM_KG_TRIPLET_EXTRACT_TMPL = (
    "Extract up to {max_knowledge_triplets} knowledge triplets from the given text. "
    "Each triplet should be in the form of (head, relation, tail) with their respective types.\n"
    "---------------------\n"
    "INITIAL ONTOLOGY:\n"
    "Entity Types: {allowed_entity_types}\n"
    "Relation Types: {allowed_relation_types}\n"
    "\n"
    "Use these types as a starting point, but introduce new types if necessary based on the context.\n"
    "\n"
    "GUIDELINES:\n"
    "- Output in JSON format: [{{'head': '', 'head_type': '', 'relation': '', 'tail': '', 'tail_type': ''}}]\n"
    "- Use the most complete form for entities (e.g., 'United States of America' instead of 'USA')\n"
    "- Keep entities concise (3-5 words max)\n"
    "- Break down complex phrases into multiple triplets\n"
    "- Ensure the knowledge graph is coherent and easily understandable\n"
    "---------------------\n"
    "EXAMPLE:\n"
    "Text: Tim Cook, CEO of Apple Inc., announced the new Apple Watch that monitors heart health. "
    "UC Berkeley researchers studied the benefits of apples.\n"
    "Output:\n"
    "[{{'head': 'Tim Cook', 'head_type': 'PERSON', 'relation': 'CEO_OF', 'tail': 'Apple Inc.', 'tail_type': 'COMPANY'}},\n"
    " {{'head': 'Apple Inc.', 'head_type': 'COMPANY', 'relation': 'PRODUCES', 'tail': 'Apple Watch', 'tail_type': 'PRODUCT'}},\n"
    " {{'head': 'Apple Watch', 'head_type': 'PRODUCT', 'relation': 'MONITORS', 'tail': 'heart health', 'tail_type': 'HEALTH_METRIC'}},\n"
    " {{'head': 'UC Berkeley', 'head_type': 'UNIVERSITY', 'relation': 'STUDIES', 'tail': 'benefits of apples', 'tail_type': 'RESEARCH_TOPIC'}}]\n"
    "---------------------\n"
    "Text: {text}\n"
    "Output:\n"
)

CUSTOM_KG_TRIPLET_EXTRACT_PROMPT = PromptTemplate(
    CUSTOM_KG_TRIPLET_EXTRACT_TMPL, 
    prompt_type=PromptType.KNOWLEDGE_TRIPLET_EXTRACT
)

# Define allowed entity types and relation types
allowed_entity_types = ["PERSON", "COMPANY", "PRODUCT", "UNIVERSITY", "HEALTH_METRIC", "RESEARCH_TOPIC"]
allowed_relation_types = ["CEO_OF", "PRODUCES", "MONITORS", "STUDIES"]

# Create the KnowledgeGraphIndex
kg_index = KnowledgeGraphIndex(
    kg_triple_extract_template=CUSTOM_KG_TRIPLET_EXTRACT_PROMPT,
    allowed_entity_types=allowed_entity_types,
    allowed_relation_types=allowed_relation_types
)

# Define your documents
documents = [
    "Tim Cook, CEO of Apple Inc., announced the new Apple Watch that monitors heart health.",
    "UC Berkeley researchers studied the benefits of apples."
]

# Create the KnowledgeGraphIndex from documents
kg_index.from_documents(documents)

This code sets up the KnowledgeGraphIndex with the custom triplet extraction prompt and the specified allowed entity and relation types, and then creates the index from the provided documents.

To continue talking to Dosu, mention @dosu.

DataNoob0723 commented 2 weeks ago

I got an error: ValueError: One of nodes, objects, or index_struct must be provided.

dosubot[bot] commented 2 weeks ago

To resolve the ValueError: One of nodes, objects, or index_struct must be provided, you need to provide a sequence of BaseNode objects when creating the KnowledgeGraphIndex. Here is an updated example that includes the creation of BaseNode objects from your documents:

from llama_index import KnowledgeGraphIndex
from llama_index.core.prompts import PromptTemplate
from llama_index.core.prompts.prompt_type import PromptType
from llama_index.legacy.schema import BaseNode, TextNode

# Define your custom prompt template
CUSTOM_KG_TRIPLET_EXTRACT_TMPL = (
    "Extract up to {max_knowledge_triplets} knowledge triplets from the given text. "
    "Each triplet should be in the form of (head, relation, tail) with their respective types.\n"
    "---------------------\n"
    "INITIAL ONTOLOGY:\n"
    "Entity Types: {allowed_entity_types}\n"
    "Relation Types: {allowed_relation_types}\n"
    "\n"
    "Use these types as a starting point, but introduce new types if necessary based on the context.\n"
    "\n"
    "GUIDELINES:\n"
    "- Output in JSON format: [{{'head': '', 'head_type': '', 'relation': '', 'tail': '', 'tail_type': ''}}]\n"
    "- Use the most complete form for entities (e.g., 'United States of America' instead of 'USA')\n"
    "- Keep entities concise (3-5 words max)\n"
    "- Break down complex phrases into multiple triplets\n"
    "- Ensure the knowledge graph is coherent and easily understandable\n"
    "---------------------\n"
    "EXAMPLE:\n"
    "Text: Tim Cook, CEO of Apple Inc., announced the new Apple Watch that monitors heart health. "
    "UC Berkeley researchers studied the benefits of apples.\n"
    "Output:\n"
    "[{{'head': 'Tim Cook', 'head_type': 'PERSON', 'relation': 'CEO_OF', 'tail': 'Apple Inc.', 'tail_type': 'COMPANY'}},\n"
    " {{'head': 'Apple Inc.', 'head_type': 'COMPANY', 'relation': 'PRODUCES', 'tail': 'Apple Watch', 'tail_type': 'PRODUCT'}},\n"
    " {{'head': 'Apple Watch', 'head_type': 'PRODUCT', 'relation': 'MONITORS', 'tail': 'heart health', 'tail_type': 'HEALTH_METRIC'}},\n"
    " {{'head': 'UC Berkeley', 'head_type': 'UNIVERSITY', 'relation': 'STUDIES', 'tail': 'benefits of apples', 'tail_type': 'RESEARCH_TOPIC'}}]\n"
    "---------------------\n"
    "Text: {text}\n"
    "Output:\n"
)

CUSTOM_KG_TRIPLET_EXTRACT_PROMPT = PromptTemplate(
    CUSTOM_KG_TRIPLET_EXTRACT_TMPL, 
    prompt_type=PromptType.KNOWLEDGE_TRIPLET_EXTRACT
)

# Define allowed entity types and relation types
allowed_entity_types = ["PERSON", "COMPANY", "PRODUCT", "UNIVERSITY", "HEALTH_METRIC", "RESEARCH_TOPIC"]
allowed_relation_types = ["CEO_OF", "PRODUCES", "MONITORS", "STUDIES"]

# Define your documents
documents = [
    "Tim Cook, CEO of Apple Inc., announced the new Apple Watch that monitors heart health.",
    "UC Berkeley researchers studied the benefits of apples."
]

# Create BaseNode objects from documents
nodes = [TextNode(text=doc) for doc in documents]

# Create the KnowledgeGraphIndex
kg_index = KnowledgeGraphIndex(
    nodes=nodes,
    kg_triple_extract_template=CUSTOM_KG_TRIPLET_EXTRACT_PROMPT,
    allowed_entity_types=allowed_entity_types,
    allowed_relation_types=allowed_relation_types
)

This code creates TextNode objects from your documents and passes them to the KnowledgeGraphIndex during initialization, resolving the error [1][2].

To continue talking to Dosu, mention @dosu.

DataNoob0723 commented 2 weeks ago

I got TypeError: KnowledgeGraphIndex._build_index_from_nodes() got an unexpected keyword argument 'allowed_entity_types'

dosubot[bot] commented 2 weeks ago

To resolve the TypeError: KnowledgeGraphIndex._build_index_from_nodes() got an unexpected keyword argument 'allowed_entity_types', you need to ensure that the _build_index_from_nodes method does not receive the allowed_entity_types argument, as it is not defined to accept it.

Here's the relevant part of the KnowledgeGraphIndex class:

def _build_index_from_nodes(self, nodes: Sequence[BaseNode]) -> KG:
    """Build the index from nodes."""
    # do simple concatenation
    index_struct = self.index_struct_cls()
    nodes_with_progress = get_tqdm_iterable(
        nodes, self._show_progress, "Processing nodes"
    )
    for n in nodes_with_progress:
        triplets = self._extract_triplets(
            n.get_content(metadata_mode=MetadataMode.LLM)
        )
        logger.debug(f"> Extracted triplets: {triplets}")
        for triplet in triplets:
            subj, _, obj = triplet
            self.upsert_triplet(triplet)
            index_struct.add_node([subj, obj], n)

        if self.include_embeddings:
            triplet_texts = [str(t) for t in triplets]
            embed_model = self._service_context.embed_model
            embed_outputs = embed_model.get_text_embedding_batch(
                triplet_texts, show_progress=self._show_progress
            )
            for rel_text, rel_embed in zip(triplet_texts, embed_outputs):
                index_struct.add_to_embedding_dict(rel_text, rel_embed)

    return index_struct

The method does not accept allowed_entity_types as a parameter. To fix the error, you should remove the allowed_entity_types argument from wherever _build_index_from_nodes is being called. Ensure that the method is called with only the parameters it is designed to accept [1].

To continue talking to Dosu, mention @dosu.

DataNoob0723 commented 2 weeks ago

but you told me to do this: kg_index = KnowledgeGraphIndex( nodes=nodes, kg_triple_extract_template=CUSTOM_KG_TRIPLET_EXTRACT_PROMPT, allowed_entity_types=allowed_entity_types, allowed_relation_types=allowed_relation_types )

dosubot[bot] commented 2 weeks ago

To create a KnowledgeGraphIndex using CUSTOM_KG_TRIPLET_EXTRACT_PROMPT and pass allowed_entity_types and allowed_relation_types without encountering a TypeError, you should ensure that these parameters are correctly handled within the prompt template and not directly passed to the KnowledgeGraphIndex constructor.

Here is the correct way to set up and use the KnowledgeGraphIndex:

  1. Define your custom prompt template.
  2. Create the KnowledgeGraphIndex with the custom prompt.
  3. Pass the allowed entity types and relation types within the prompt template.

Here is an example:

from llama_index.legacy.prompts.base import BasePromptTemplate
from llama_index.legacy.indices.knowledge_graph.base import KnowledgeGraphIndex
from llama_index.legacy.schema import TextNode

# Define your custom prompt template
CUSTOM_KG_TRIPLET_EXTRACT_PROMPT = BasePromptTemplate(
    template_str=(
        "Extract up to {max_knowledge_triplets} knowledge triplets from the given text. "
        "Each triplet should be in the form of (head, relation, tail) with their respective types.\n"
        "---------------------\n"
        "INITIAL ONTOLOGY:\n"
        "Entity Types: {allowed_entity_types}\n"
        "Relation Types: {allowed_relation_types}\n"
        "\n"
        "Use these types as a starting point, but introduce new types if necessary based on the context.\n"
        "\n"
        "GUIDELINES:\n"
        "- Output in JSON format: [{{'head': '', 'head_type': '', 'relation': '', 'tail': '', 'tail_type': ''}}]\n"
        "- Use the most complete form for entities (e.g., 'United States of America' instead of 'USA')\n"
        "- Keep entities concise (3-5 words max)\n"
        "- Break down complex phrases into multiple triplets\n"
        "- Ensure the knowledge graph is coherent and easily understandable\n"
        "---------------------\n"
        "EXAMPLE:\n"
        "Text: Tim Cook, CEO of Apple Inc., announced the new Apple Watch that monitors heart health. "
        "UC Berkeley researchers studied the benefits of apples.\n"
        "Output:\n"
        "[{{'head': 'Tim Cook', 'head_type': 'PERSON', 'relation': 'CEO_OF', 'tail': 'Apple Inc.', 'tail_type': 'COMPANY'}},\n"
        " {{'head': 'Apple Inc.', 'head_type': 'COMPANY', 'relation': 'PRODUCES', 'tail': 'Apple Watch', 'tail_type': 'PRODUCT'}},\n"
        " {{'head': 'Apple Watch', 'head_type': 'PRODUCT', 'relation': 'MONITORS', 'tail': 'heart health', 'tail_type': 'HEALTH_METRIC'}},\n"
        " {{'head': 'UC Berkeley', 'head_type': 'UNIVERSITY', 'relation': 'STUDIES', 'tail': 'benefits of apples', 'tail_type': 'RESEARCH_TOPIC'}}]\n"
        "---------------------\n"
        "Text: {text}\n"
        "Output:\n"
    )
)

# Define allowed entity types and relation types
allowed_entity_types = ["PERSON", "COMPANY", "PRODUCT", "UNIVERSITY", "HEALTH_METRIC", "RESEARCH_TOPIC"]
allowed_relation_types = ["CEO_OF", "PRODUCES", "MONITORS", "STUDIES"]

# Define your documents
documents = [
    "Tim Cook, CEO of Apple Inc., announced the new Apple Watch that monitors heart health.",
    "UC Berkeley researchers studied the benefits of apples."
]

# Create TextNode objects from documents
nodes = [TextNode(text=doc) for doc in documents]

# Create the KnowledgeGraphIndex
kg_index = KnowledgeGraphIndex(
    nodes=nodes,
    kg_triple_extract_template=CUSTOM_KG_TRIPLET_EXTRACT_PROMPT.partial_format(
        allowed_entity_types=allowed_entity_types,
        allowed_relation_types=allowed_relation_types,
        max_knowledge_triplets=10
    )
)

This code sets up the KnowledgeGraphIndex with the custom triplet extraction prompt and the specified allowed entity and relation types, ensuring that these constraints are correctly passed and utilized during the knowledge triplet extraction process [1][2].

To continue talking to Dosu, mention @dosu.

DataNoob0723 commented 2 weeks ago

how to persist the graph index after building it?

dosubot[bot] commented 2 weeks ago

To persist the graph index after building it, you can use the built-in .persist() method. Here are examples for different types of graph indexes:

Composable Graph

graph.root_index.storage_context.persist(persist_dir="<persist_dir>")

This will write all the data to disk at the specified location.

SimpleGraphStore

graph_store = SimpleGraphStore()
# ... (build your graph store)
graph_store.persist(persist_path="<persist_path>")

This will save the graph store data to the specified path.

PropertyGraphIndex

from llama_index.core import PropertyGraphIndex

# create
index = PropertyGraphIndex.from_documents(documents)

# save
index.storage_context.persist("./storage")

# load
from llama_index.core import StorageContext, load_index_from_storage

storage_context = StorageContext.from_defaults(persist_dir="./storage")
index = load_index_from_storage(storage_context)

This will save the PropertyGraphIndex data to the specified directory and allow you to load it back when needed.

Loading SimpleGraphStore

from llama_index.legacy.graph_stores.simple import SimpleGraphStore

# load
graph_store = SimpleGraphStore.from_persist_path(persist_path="<persist_path>")

This will load the SimpleGraphStore data from the specified path.

Additionally, you can use constants from the llama_index.core.graph_stores.types module to standardize the persist directory and file names:

from llama_index.core.graph_stores.types import DEFAULT_PERSIST_DIR, DEFAULT_PERSIST_FNAME

# Example usage
graph_store.persist(persist_path=DEFAULT_PERSIST_DIR + "/" + DEFAULT_PERSIST_FNAME)

This ensures that you are using the default directory and file name conventions provided by the LlamaIndex framework [1][2][3][4][5].

To continue talking to Dosu, mention @dosu.

DataNoob0723 commented 2 weeks ago

can I use the updated template directly in kg_index.from_documents method

DataNoob0723 commented 2 weeks ago

also how to print out the actual prompt used generating the response, with the actual values of the variables in the prompt template

dosubot[bot] commented 2 weeks ago

Yes, you can use the updated template directly in the kg_index.from_documents method. To print out the actual prompt used for generating the response with the actual values of the variables in the prompt template, you can use the format method of the PromptTemplate class.

Here is an example:

from llama_index.legacy.prompts.base import PromptTemplate
from llama_index.legacy.prompts.prompt_type import PromptType
from llama_index.legacy.indices.knowledge_graph.base import KnowledgeGraphIndex
from llama_index.legacy.schema import TextNode

# Define your custom prompt template
CUSTOM_KG_TRIPLET_EXTRACT_TMPL = (
    "Extract up to {max_knowledge_triplets} knowledge triplets from the given text. "
    "Each triplet should be in the form of (head, relation, tail) with their respective types.\n"
    "---------------------\n"
    "INITIAL ONTOLOGY:\n"
    "Entity Types: {allowed_entity_types}\n"
    "Relation Types: {allowed_relation_types}\n"
    "\n"
    "Use these types as a starting point, but introduce new types if necessary based on the context.\n"
    "\n"
    "GUIDELINES:\n"
    "- Output in JSON format: [{{'head': '', 'head_type': '', 'relation': '', 'tail': '', 'tail_type': ''}}]\n"
    "- Use the most complete form for entities (e.g., 'United States of America' instead of 'USA')\n"
    "- Keep entities concise (3-5 words max)\n"
    "- Break down complex phrases into multiple triplets\n"
    "- Ensure the knowledge graph is coherent and easily understandable\n"
    "---------------------\n"
    "EXAMPLE:\n"
    "Text: Tim Cook, CEO of Apple Inc., announced the new Apple Watch that monitors heart health. "
    "UC Berkeley researchers studied the benefits of apples.\n"
    "Output:\n"
    "[{{'head': 'Tim Cook', 'head_type': 'PERSON', 'relation': 'CEO_OF', 'tail': 'Apple Inc.', 'tail_type': 'COMPANY'}},\n"
    " {{'head': 'Apple Inc.', 'head_type': 'COMPANY', 'relation': 'PRODUCES', 'tail': 'Apple Watch', 'tail_type': 'PRODUCT'}},\n"
    " {{'head': 'Apple Watch', 'head_type': 'PRODUCT', 'relation': 'MONITORS', 'tail': 'heart health', 'tail_type': 'HEALTH_METRIC'}},\n"
    " {{'head': 'UC Berkeley', 'head_type': 'UNIVERSITY', 'relation': 'STUDIES', 'tail': 'benefits of apples', 'tail_type': 'RESEARCH_TOPIC'}}]\n"
    "---------------------\n"
    "Text: {text}\n"
    "Output:\n"
)

CUSTOM_KG_TRIPLET_EXTRACT_PROMPT = PromptTemplate(
    CUSTOM_KG_TRIPLET_EXTRACT_TMPL, 
    prompt_type=PromptType.KNOWLEDGE_TRIPLET_EXTRACT
)

# Define allowed entity types and relation types
allowed_entity_types = ["PERSON", "COMPANY", "PRODUCT", "UNIVERSITY", "HEALTH_METRIC", "RESEARCH_TOPIC"]
allowed_relation_types = ["CEO_OF", "PRODUCES", "MONITORS", "STUDIES"]

# Format the prompt with actual values
formatted_prompt = CUSTOM_KG_TRIPLET_EXTRACT_PROMPT.format(
    allowed_entity_types=allowed_entity_types,
    allowed_relation_types=allowed_relation_types,
    max_knowledge_triplets=10,
    text="Tim Cook, CEO of Apple Inc., announced the new Apple Watch that monitors heart health."
)

# Print the formatted prompt
print(formatted_prompt)

# Define your documents
documents = [
    "Tim Cook, CEO of Apple Inc., announced the new Apple Watch that monitors heart health.",
    "UC Berkeley researchers studied the benefits of apples."
]

# Create TextNode objects from documents
nodes = [TextNode(text=doc) for doc in documents]

# Create the KnowledgeGraphIndex
kg_index = KnowledgeGraphIndex(
    nodes=nodes,
    kg_triplet_extract_template=CUSTOM_KG_TRIPLET_EXTRACT_PROMPT.partial_format(
        allowed_entity_types=allowed_entity_types,
        allowed_relation_types=allowed_relation_types,
        max_knowledge_triplets=10
    )
)

# Alternatively, you can use the from_documents method
kg_index = KnowledgeGraphIndex.from_documents(
    documents,
    kg_triplet_extract_template=CUSTOM_KG_TRIPLET_EXTRACT_PROMPT.partial_format(
        allowed_entity_types=allowed_entity_types,
        allowed_relation_types=allowed_relation_types,
        max_knowledge_triplets=10
    )
)

This code demonstrates how to use the updated template directly in the kg_index.from_documents method and how to print out the actual prompt used for generating the response with the actual values of the variables in the prompt template [1][2][3].

To continue talking to Dosu, mention @dosu.