Open DataNoob0723 opened 2 weeks ago
To update the default prompt of KnowledgeGraphIndex
to set constraints on the entities and relations it extracts, you can modify the DEFAULT_KG_TRIPLET_EXTRACT_PROMPT
template. This template allows you to specify the allowed entity types and relation types.
Here is an example of how you can update the DEFAULT_KG_TRIPLET_EXTRACT_PROMPT
to set constraints on the entities and relations:
from llama_index.core.prompts.base import PromptTemplate
from llama_index.core.prompts.prompt_type import PromptType
from llama_index.core.prompts.default_prompts import DEFAULT_KG_TRIPLET_EXTRACT_PROMPT
CUSTOM_KG_TRIPLET_EXTRACT_TMPL = (
"Extract up to {max_knowledge_triplets} knowledge triplets from the given text. "
"Each triplet should be in the form of (head, relation, tail) with their respective types.\n"
"---------------------\n"
"INITIAL ONTOLOGY:\n"
"Entity Types: {allowed_entity_types}\n"
"Relation Types: {allowed_relation_types}\n"
"\n"
"Use these types as a starting point, but introduce new types if necessary based on the context.\n"
"\n"
"GUIDELINES:\n"
"- Output in JSON format: [{{'head': '', 'head_type': '', 'relation': '', 'tail': '', 'tail_type': ''}}]\n"
"- Use the most complete form for entities (e.g., 'United States of America' instead of 'USA')\n"
"- Keep entities concise (3-5 words max)\n"
"- Break down complex phrases into multiple triplets\n"
"- Ensure the knowledge graph is coherent and easily understandable\n"
"---------------------\n"
"EXAMPLE:\n"
"Text: Tim Cook, CEO of Apple Inc., announced the new Apple Watch that monitors heart health. "
"UC Berkeley researchers studied the benefits of apples.\n"
"Output:\n"
"[{{'head': 'Tim Cook', 'head_type': 'PERSON', 'relation': 'CEO_OF', 'tail': 'Apple Inc.', 'tail_type': 'ORGANIZATION'}},\n"
" {{'head': 'Apple Inc.', 'head_type': 'ORGANIZATION', 'relation': 'PRODUCES', 'tail': 'Apple Watch', 'tail_type': 'PRODUCT'}},\n"
" {{'head': 'Apple Watch', 'head_type': 'PRODUCT', 'relation': 'MONITORS', 'tail': 'heart health', 'tail_type': 'HEALTH_METRIC'}},\n"
" {{'head': 'UC Berkeley', 'head_type': 'ORGANIZATION', 'relation': 'STUDIES', 'tail': 'benefits of apples', 'tail_type': 'RESEARCH_TOPIC'}}]\n"
"---------------------\n"
"Text: {text}\n"
"Output:\n"
)
CUSTOM_KG_TRIPLET_EXTRACT_PROMPT = PromptTemplate(
CUSTOM_KG_TRIPLET_EXTRACT_TMPL, prompt_type=PromptType.KNOWLEDGE_TRIPLET_EXTRACT
)
In this custom template, you can specify the allowed_entity_types
and allowed_relation_types
to set constraints on the entities and relations extracted. You can then use this custom prompt template when initializing the KnowledgeGraphIndex
:
from llama_index.core.indices.knowledge_graph.base import KnowledgeGraphIndex
kg_index = KnowledgeGraphIndex(
kg_triplet_extract_template=CUSTOM_KG_TRIPLET_EXTRACT_PROMPT,
max_triplets_per_chunk=10,
include_embeddings=False,
show_progress=True
)
This will ensure that the KnowledgeGraphIndex
uses your custom prompt template with the specified constraints on entity and relation types. Additionally, you can leverage the SchemaLLMPathExtractor
class to enforce strict validation of entities and relations based on a predefined schema:
from llama_index.core.llms.llm import LLM
from llama_index.core.indices.property_graph.transformations.schema_llm import SchemaLLMPathExtractor
llm = LLM() # Initialize your language model here
schema_extractor = SchemaLLMPathExtractor(
llm=llm,
extract_prompt=CUSTOM_KG_TRIPLET_EXTRACT_PROMPT,
possible_entities=["PERSON", "ORGANIZATION", "PRODUCT", "HEALTH_METRIC", "RESEARCH_TOPIC"],
possible_relations=["CEO_OF", "PRODUCES", "MONITORS", "STUDIES"],
strict=True,
max_triplets_per_chunk=10
)
# Use schema_extractor to process your data
This setup ensures that the KnowledgeGraphIndex
uses a custom prompt template with specified constraints and validates the extracted entities and relations against a predefined schema [1][2][3].
To continue talking to Dosu, mention @dosu.
How can I pass allowed_entity_types and allowed_relation_types to the template.
To pass allowed_entity_types
and allowed_relation_types
to the template, you can use the SchemaLLMPathExtractor
class to specify an exact schema containing possible entity types and relation types. This approach ensures that the entities and relations extracted follow a predefined schema. Here is an example of how you can do this:
from typing import Literal
from llama_index.llms.ollama import Ollama
from llama_index.core.indices.property_graph import SchemaLLMPathExtractor
# Define possible entity types and relation types
entities = Literal["PERSON", "PLACE", "ORGANIZATION"]
relations = Literal["HAS", "PART_OF", "WORKED_ON", "WORKED_WITH", "WORKED_AT"]
# Define the validation schema for which entities can have which relations
validation_schema = [
("ORGANIZATION", "HAS", "PERSON"),
("PERSON", "WORKED_AT", "ORGANIZATION"),
("PERSON", "WORKED_WITH", "PERSON"),
("PERSON", "WORKED_ON", "ORGANIZATION"),
("PERSON", "PART_OF", "ORGANIZATION"),
("ORGANIZATION", "PART_OF", "ORGANIZATION"),
("PERSON", "WORKED_AT", "PLACE"),
]
# Create the SchemaLLMPathExtractor with the defined schema
kg_extractor = SchemaLLMPathExtractor(
llm=Ollama(model="llama3", json_mode=True, request_timeout=3600),
possible_entities=entities,
possible_relations=relations,
kg_validation_schema=validation_schema,
strict=True, # Ensures only values within the schema are allowed
)
In this example, allowed_entity_types
and allowed_relation_types
are defined using the Literal
type, and the SchemaLLMPathExtractor
is used to enforce these types when extracting knowledge triplets. This method ensures that the entities and relations adhere to the specified schema [1][2].
To continue talking to Dosu, mention @dosu.
I mean how can I pass allowed_entity_types and allowed_relation_types to CUSTOM_KG_TRIPLET_EXTRACT_TMPL?
To pass allowed_entity_types
and allowed_relation_types
to CUSTOM_KG_TRIPLET_EXTRACT_TMPL
, you can use the DEFAULT_DYNAMIC_EXTRACT_TMPL
as a reference. Here is an example of how you can define CUSTOM_KG_TRIPLET_EXTRACT_TMPL
:
CUSTOM_KG_TRIPLET_EXTRACT_TMPL = (
"Extract up to {max_knowledge_triplets} knowledge triplets from the given text. "
"Each triplet should be in the form of (head, relation, tail) with their respective types.\n"
"---------------------\n"
"INITIAL ONTOLOGY:\n"
"Entity Types: {allowed_entity_types}\n"
"Relation Types: {allowed_relation_types}\n"
"\n"
"Use these types as a starting point, but introduce new types if necessary based on the context.\n"
"\n"
"GUIDELINES:\n"
"- Output in JSON format: [{{'head': '', 'head_type': '', 'relation': '', 'tail': '', 'tail_type': ''}}]\n"
"- Use the most complete form for entities (e.g., 'United States of America' instead of 'USA')\n"
"- Keep entities concise (3-5 words max)\n"
"- Break down complex phrases into multiple triplets\n"
"- Ensure the knowledge graph is coherent and easily understandable\n"
"---------------------\n"
"EXAMPLE:\n"
"Text: Tim Cook, CEO of Apple Inc., announced the new Apple Watch that monitors heart health. "
"UC Berkeley researchers studied the benefits of apples.\n"
"Output:\n"
"[{{'head': 'Tim Cook', 'head_type': 'PERSON', 'relation': 'CEO_OF', 'tail': 'Apple Inc.', 'tail_type': 'COMPANY'}},\n"
" {{'head': 'Apple Inc.', 'head_type': 'COMPANY', 'relation': 'PRODUCES', 'tail': 'Apple Watch', 'tail_type': 'PRODUCT'}},\n"
" {{'head': 'Apple Watch', 'head_type': 'PRODUCT', 'relation': 'MONITORS', 'tail': 'heart health', 'tail_type': 'HEALTH_METRIC'}},\n"
" {{'head': 'UC Berkeley', 'head_type': 'UNIVERSITY', 'relation': 'STUDIES', 'tail': 'benefits of apples', 'tail_type': 'RESEARCH_TOPIC'}}]\n"
"---------------------\n"
"Text: {text}\n"
"Output:\n"
)
You can then create a PromptTemplate
using this template:
from llama_index.core.prompts.base import PromptTemplate
from llama_index.core.prompts.prompt_type import PromptType
CUSTOM_KG_TRIPLET_EXTRACT_PROMPT = PromptTemplate(
CUSTOM_KG_TRIPLET_EXTRACT_TMPL,
prompt_type=PromptType.KNOWLEDGE_TRIPLET_EXTRACT
)
This template includes placeholders for allowed_entity_types
and allowed_relation_types
which you can pass when using the template. The DynamicLLMPathExtractor
class from the LlamaIndex framework can be used to handle the extraction process, and it allows for specifying initial ontologies and expanding them dynamically. Here is an example of how you might set up the extractor:
from llama_index.core.indices.property_graph.transformations.dynamic_llm import DynamicLLMPathExtractor
from llama_index.core.llms.llm import LLM
llm = LLM() # Initialize your LLM instance
extractor = DynamicLLMPathExtractor(
llm=llm,
extract_prompt=CUSTOM_KG_TRIPLET_EXTRACT_PROMPT,
allowed_entity_types=['PERSON', 'COMPANY', 'PRODUCT', 'HEALTH_METRIC', 'UNIVERSITY', 'RESEARCH_TOPIC'],
allowed_relation_types=['CEO_OF', 'PRODUCES', 'MONITORS', 'STUDIES']
)
This setup ensures that the allowed_entity_types
and allowed_relation_types
are correctly passed and utilized during the knowledge triplet extraction process [1][2].
To continue talking to Dosu, mention @dosu.
After creating CUSTOM_KG_TRIPLET_EXTRACT_PROMPT, how can I create KnowledgeGraphIndex using it and pass allowed_entity_types and allowed_relation_types?
To create a KnowledgeGraphIndex
using CUSTOM_KG_TRIPLET_EXTRACT_PROMPT
and pass allowed_entity_types
and allowed_relation_types
, you can follow these steps:
KnowledgeGraphIndex
with the custom prompt and the allowed entity and relation types.Here is an example:
from llama_index import KnowledgeGraphIndex
from llama_index.core.prompts import PromptTemplate
from llama_index.core.prompts.prompt_type import PromptType
# Define your custom prompt template
CUSTOM_KG_TRIPLET_EXTRACT_TMPL = (
"Extract up to {max_knowledge_triplets} knowledge triplets from the given text. "
"Each triplet should be in the form of (head, relation, tail) with their respective types.\n"
"---------------------\n"
"INITIAL ONTOLOGY:\n"
"Entity Types: {allowed_entity_types}\n"
"Relation Types: {allowed_relation_types}\n"
"\n"
"Use these types as a starting point, but introduce new types if necessary based on the context.\n"
"\n"
"GUIDELINES:\n"
"- Output in JSON format: [{{'head': '', 'head_type': '', 'relation': '', 'tail': '', 'tail_type': ''}}]\n"
"- Use the most complete form for entities (e.g., 'United States of America' instead of 'USA')\n"
"- Keep entities concise (3-5 words max)\n"
"- Break down complex phrases into multiple triplets\n"
"- Ensure the knowledge graph is coherent and easily understandable\n"
"---------------------\n"
"EXAMPLE:\n"
"Text: Tim Cook, CEO of Apple Inc., announced the new Apple Watch that monitors heart health. "
"UC Berkeley researchers studied the benefits of apples.\n"
"Output:\n"
"[{{'head': 'Tim Cook', 'head_type': 'PERSON', 'relation': 'CEO_OF', 'tail': 'Apple Inc.', 'tail_type': 'COMPANY'}},\n"
" {{'head': 'Apple Inc.', 'head_type': 'COMPANY', 'relation': 'PRODUCES', 'tail': 'Apple Watch', 'tail_type': 'PRODUCT'}},\n"
" {{'head': 'Apple Watch', 'head_type': 'PRODUCT', 'relation': 'MONITORS', 'tail': 'heart health', 'tail_type': 'HEALTH_METRIC'}},\n"
" {{'head': 'UC Berkeley', 'head_type': 'UNIVERSITY', 'relation': 'STUDIES', 'tail': 'benefits of apples', 'tail_type': 'RESEARCH_TOPIC'}}]\n"
"---------------------\n"
"Text: {text}\n"
"Output:\n"
)
CUSTOM_KG_TRIPLET_EXTRACT_PROMPT = PromptTemplate(
CUSTOM_KG_TRIPLET_EXTRACT_TMPL,
prompt_type=PromptType.KNOWLEDGE_TRIPLET_EXTRACT
)
# Define allowed entity types and relation types
allowed_entity_types = ["PERSON", "COMPANY", "PRODUCT", "UNIVERSITY", "HEALTH_METRIC", "RESEARCH_TOPIC"]
allowed_relation_types = ["CEO_OF", "PRODUCES", "MONITORS", "STUDIES"]
# Create the KnowledgeGraphIndex
kg_index = KnowledgeGraphIndex(
kg_triple_extract_template=CUSTOM_KG_TRIPLET_EXTRACT_PROMPT,
allowed_entity_types=allowed_entity_types,
allowed_relation_types=allowed_relation_types
)
This code sets up the KnowledgeGraphIndex
with the custom triplet extraction prompt and the specified allowed entity and relation types [1].
To continue talking to Dosu, mention @dosu.
How can I pass my documents to create this kg_index
To pass your documents to create the kg_index
, you can use the from_documents
method provided by the KnowledgeGraphIndex
class. Here is an example of how you can do this:
from llama_index import KnowledgeGraphIndex
from llama_index.core.prompts import PromptTemplate
from llama_index.core.prompts.prompt_type import PromptType
# Define your custom prompt template
CUSTOM_KG_TRIPLET_EXTRACT_TMPL = (
"Extract up to {max_knowledge_triplets} knowledge triplets from the given text. "
"Each triplet should be in the form of (head, relation, tail) with their respective types.\n"
"---------------------\n"
"INITIAL ONTOLOGY:\n"
"Entity Types: {allowed_entity_types}\n"
"Relation Types: {allowed_relation_types}\n"
"\n"
"Use these types as a starting point, but introduce new types if necessary based on the context.\n"
"\n"
"GUIDELINES:\n"
"- Output in JSON format: [{{'head': '', 'head_type': '', 'relation': '', 'tail': '', 'tail_type': ''}}]\n"
"- Use the most complete form for entities (e.g., 'United States of America' instead of 'USA')\n"
"- Keep entities concise (3-5 words max)\n"
"- Break down complex phrases into multiple triplets\n"
"- Ensure the knowledge graph is coherent and easily understandable\n"
"---------------------\n"
"EXAMPLE:\n"
"Text: Tim Cook, CEO of Apple Inc., announced the new Apple Watch that monitors heart health. "
"UC Berkeley researchers studied the benefits of apples.\n"
"Output:\n"
"[{{'head': 'Tim Cook', 'head_type': 'PERSON', 'relation': 'CEO_OF', 'tail': 'Apple Inc.', 'tail_type': 'COMPANY'}},\n"
" {{'head': 'Apple Inc.', 'head_type': 'COMPANY', 'relation': 'PRODUCES', 'tail': 'Apple Watch', 'tail_type': 'PRODUCT'}},\n"
" {{'head': 'Apple Watch', 'head_type': 'PRODUCT', 'relation': 'MONITORS', 'tail': 'heart health', 'tail_type': 'HEALTH_METRIC'}},\n"
" {{'head': 'UC Berkeley', 'head_type': 'UNIVERSITY', 'relation': 'STUDIES', 'tail': 'benefits of apples', 'tail_type': 'RESEARCH_TOPIC'}}]\n"
"---------------------\n"
"Text: {text}\n"
"Output:\n"
)
CUSTOM_KG_TRIPLET_EXTRACT_PROMPT = PromptTemplate(
CUSTOM_KG_TRIPLET_EXTRACT_TMPL,
prompt_type=PromptType.KNOWLEDGE_TRIPLET_EXTRACT
)
# Define allowed entity types and relation types
allowed_entity_types = ["PERSON", "COMPANY", "PRODUCT", "UNIVERSITY", "HEALTH_METRIC", "RESEARCH_TOPIC"]
allowed_relation_types = ["CEO_OF", "PRODUCES", "MONITORS", "STUDIES"]
# Create the KnowledgeGraphIndex
kg_index = KnowledgeGraphIndex(
kg_triple_extract_template=CUSTOM_KG_TRIPLET_EXTRACT_PROMPT,
allowed_entity_types=allowed_entity_types,
allowed_relation_types=allowed_relation_types
)
# Define your documents
documents = [
"Tim Cook, CEO of Apple Inc., announced the new Apple Watch that monitors heart health.",
"UC Berkeley researchers studied the benefits of apples."
]
# Create the KnowledgeGraphIndex from documents
kg_index.from_documents(documents)
This code sets up the KnowledgeGraphIndex
with the custom triplet extraction prompt and the specified allowed entity and relation types, and then creates the index from the provided documents.
To continue talking to Dosu, mention @dosu.
I got an error: ValueError: One of nodes, objects, or index_struct must be provided.
To resolve the ValueError: One of nodes, objects, or index_struct must be provided
, you need to provide a sequence of BaseNode
objects when creating the KnowledgeGraphIndex
. Here is an updated example that includes the creation of BaseNode
objects from your documents:
from llama_index import KnowledgeGraphIndex
from llama_index.core.prompts import PromptTemplate
from llama_index.core.prompts.prompt_type import PromptType
from llama_index.legacy.schema import BaseNode, TextNode
# Define your custom prompt template
CUSTOM_KG_TRIPLET_EXTRACT_TMPL = (
"Extract up to {max_knowledge_triplets} knowledge triplets from the given text. "
"Each triplet should be in the form of (head, relation, tail) with their respective types.\n"
"---------------------\n"
"INITIAL ONTOLOGY:\n"
"Entity Types: {allowed_entity_types}\n"
"Relation Types: {allowed_relation_types}\n"
"\n"
"Use these types as a starting point, but introduce new types if necessary based on the context.\n"
"\n"
"GUIDELINES:\n"
"- Output in JSON format: [{{'head': '', 'head_type': '', 'relation': '', 'tail': '', 'tail_type': ''}}]\n"
"- Use the most complete form for entities (e.g., 'United States of America' instead of 'USA')\n"
"- Keep entities concise (3-5 words max)\n"
"- Break down complex phrases into multiple triplets\n"
"- Ensure the knowledge graph is coherent and easily understandable\n"
"---------------------\n"
"EXAMPLE:\n"
"Text: Tim Cook, CEO of Apple Inc., announced the new Apple Watch that monitors heart health. "
"UC Berkeley researchers studied the benefits of apples.\n"
"Output:\n"
"[{{'head': 'Tim Cook', 'head_type': 'PERSON', 'relation': 'CEO_OF', 'tail': 'Apple Inc.', 'tail_type': 'COMPANY'}},\n"
" {{'head': 'Apple Inc.', 'head_type': 'COMPANY', 'relation': 'PRODUCES', 'tail': 'Apple Watch', 'tail_type': 'PRODUCT'}},\n"
" {{'head': 'Apple Watch', 'head_type': 'PRODUCT', 'relation': 'MONITORS', 'tail': 'heart health', 'tail_type': 'HEALTH_METRIC'}},\n"
" {{'head': 'UC Berkeley', 'head_type': 'UNIVERSITY', 'relation': 'STUDIES', 'tail': 'benefits of apples', 'tail_type': 'RESEARCH_TOPIC'}}]\n"
"---------------------\n"
"Text: {text}\n"
"Output:\n"
)
CUSTOM_KG_TRIPLET_EXTRACT_PROMPT = PromptTemplate(
CUSTOM_KG_TRIPLET_EXTRACT_TMPL,
prompt_type=PromptType.KNOWLEDGE_TRIPLET_EXTRACT
)
# Define allowed entity types and relation types
allowed_entity_types = ["PERSON", "COMPANY", "PRODUCT", "UNIVERSITY", "HEALTH_METRIC", "RESEARCH_TOPIC"]
allowed_relation_types = ["CEO_OF", "PRODUCES", "MONITORS", "STUDIES"]
# Define your documents
documents = [
"Tim Cook, CEO of Apple Inc., announced the new Apple Watch that monitors heart health.",
"UC Berkeley researchers studied the benefits of apples."
]
# Create BaseNode objects from documents
nodes = [TextNode(text=doc) for doc in documents]
# Create the KnowledgeGraphIndex
kg_index = KnowledgeGraphIndex(
nodes=nodes,
kg_triple_extract_template=CUSTOM_KG_TRIPLET_EXTRACT_PROMPT,
allowed_entity_types=allowed_entity_types,
allowed_relation_types=allowed_relation_types
)
This code creates TextNode
objects from your documents and passes them to the KnowledgeGraphIndex
during initialization, resolving the error [1][2].
To continue talking to Dosu, mention @dosu.
I got TypeError: KnowledgeGraphIndex._build_index_from_nodes() got an unexpected keyword argument 'allowed_entity_types'
To resolve the TypeError: KnowledgeGraphIndex._build_index_from_nodes() got an unexpected keyword argument 'allowed_entity_types'
, you need to ensure that the _build_index_from_nodes
method does not receive the allowed_entity_types
argument, as it is not defined to accept it.
Here's the relevant part of the KnowledgeGraphIndex
class:
def _build_index_from_nodes(self, nodes: Sequence[BaseNode]) -> KG:
"""Build the index from nodes."""
# do simple concatenation
index_struct = self.index_struct_cls()
nodes_with_progress = get_tqdm_iterable(
nodes, self._show_progress, "Processing nodes"
)
for n in nodes_with_progress:
triplets = self._extract_triplets(
n.get_content(metadata_mode=MetadataMode.LLM)
)
logger.debug(f"> Extracted triplets: {triplets}")
for triplet in triplets:
subj, _, obj = triplet
self.upsert_triplet(triplet)
index_struct.add_node([subj, obj], n)
if self.include_embeddings:
triplet_texts = [str(t) for t in triplets]
embed_model = self._service_context.embed_model
embed_outputs = embed_model.get_text_embedding_batch(
triplet_texts, show_progress=self._show_progress
)
for rel_text, rel_embed in zip(triplet_texts, embed_outputs):
index_struct.add_to_embedding_dict(rel_text, rel_embed)
return index_struct
The method does not accept allowed_entity_types
as a parameter. To fix the error, you should remove the allowed_entity_types
argument from wherever _build_index_from_nodes
is being called. Ensure that the method is called with only the parameters it is designed to accept [1].
To continue talking to Dosu, mention @dosu.
but you told me to do this: kg_index = KnowledgeGraphIndex( nodes=nodes, kg_triple_extract_template=CUSTOM_KG_TRIPLET_EXTRACT_PROMPT, allowed_entity_types=allowed_entity_types, allowed_relation_types=allowed_relation_types )
To create a KnowledgeGraphIndex
using CUSTOM_KG_TRIPLET_EXTRACT_PROMPT
and pass allowed_entity_types
and allowed_relation_types
without encountering a TypeError
, you should ensure that these parameters are correctly handled within the prompt template and not directly passed to the KnowledgeGraphIndex
constructor.
Here is the correct way to set up and use the KnowledgeGraphIndex
:
KnowledgeGraphIndex
with the custom prompt.Here is an example:
from llama_index.legacy.prompts.base import BasePromptTemplate
from llama_index.legacy.indices.knowledge_graph.base import KnowledgeGraphIndex
from llama_index.legacy.schema import TextNode
# Define your custom prompt template
CUSTOM_KG_TRIPLET_EXTRACT_PROMPT = BasePromptTemplate(
template_str=(
"Extract up to {max_knowledge_triplets} knowledge triplets from the given text. "
"Each triplet should be in the form of (head, relation, tail) with their respective types.\n"
"---------------------\n"
"INITIAL ONTOLOGY:\n"
"Entity Types: {allowed_entity_types}\n"
"Relation Types: {allowed_relation_types}\n"
"\n"
"Use these types as a starting point, but introduce new types if necessary based on the context.\n"
"\n"
"GUIDELINES:\n"
"- Output in JSON format: [{{'head': '', 'head_type': '', 'relation': '', 'tail': '', 'tail_type': ''}}]\n"
"- Use the most complete form for entities (e.g., 'United States of America' instead of 'USA')\n"
"- Keep entities concise (3-5 words max)\n"
"- Break down complex phrases into multiple triplets\n"
"- Ensure the knowledge graph is coherent and easily understandable\n"
"---------------------\n"
"EXAMPLE:\n"
"Text: Tim Cook, CEO of Apple Inc., announced the new Apple Watch that monitors heart health. "
"UC Berkeley researchers studied the benefits of apples.\n"
"Output:\n"
"[{{'head': 'Tim Cook', 'head_type': 'PERSON', 'relation': 'CEO_OF', 'tail': 'Apple Inc.', 'tail_type': 'COMPANY'}},\n"
" {{'head': 'Apple Inc.', 'head_type': 'COMPANY', 'relation': 'PRODUCES', 'tail': 'Apple Watch', 'tail_type': 'PRODUCT'}},\n"
" {{'head': 'Apple Watch', 'head_type': 'PRODUCT', 'relation': 'MONITORS', 'tail': 'heart health', 'tail_type': 'HEALTH_METRIC'}},\n"
" {{'head': 'UC Berkeley', 'head_type': 'UNIVERSITY', 'relation': 'STUDIES', 'tail': 'benefits of apples', 'tail_type': 'RESEARCH_TOPIC'}}]\n"
"---------------------\n"
"Text: {text}\n"
"Output:\n"
)
)
# Define allowed entity types and relation types
allowed_entity_types = ["PERSON", "COMPANY", "PRODUCT", "UNIVERSITY", "HEALTH_METRIC", "RESEARCH_TOPIC"]
allowed_relation_types = ["CEO_OF", "PRODUCES", "MONITORS", "STUDIES"]
# Define your documents
documents = [
"Tim Cook, CEO of Apple Inc., announced the new Apple Watch that monitors heart health.",
"UC Berkeley researchers studied the benefits of apples."
]
# Create TextNode objects from documents
nodes = [TextNode(text=doc) for doc in documents]
# Create the KnowledgeGraphIndex
kg_index = KnowledgeGraphIndex(
nodes=nodes,
kg_triple_extract_template=CUSTOM_KG_TRIPLET_EXTRACT_PROMPT.partial_format(
allowed_entity_types=allowed_entity_types,
allowed_relation_types=allowed_relation_types,
max_knowledge_triplets=10
)
)
This code sets up the KnowledgeGraphIndex
with the custom triplet extraction prompt and the specified allowed entity and relation types, ensuring that these constraints are correctly passed and utilized during the knowledge triplet extraction process [1][2].
To continue talking to Dosu, mention @dosu.
how to persist the graph index after building it?
To persist the graph index after building it, you can use the built-in .persist()
method. Here are examples for different types of graph indexes:
graph.root_index.storage_context.persist(persist_dir="<persist_dir>")
This will write all the data to disk at the specified location.
graph_store = SimpleGraphStore()
# ... (build your graph store)
graph_store.persist(persist_path="<persist_path>")
This will save the graph store data to the specified path.
from llama_index.core import PropertyGraphIndex
# create
index = PropertyGraphIndex.from_documents(documents)
# save
index.storage_context.persist("./storage")
# load
from llama_index.core import StorageContext, load_index_from_storage
storage_context = StorageContext.from_defaults(persist_dir="./storage")
index = load_index_from_storage(storage_context)
This will save the PropertyGraphIndex
data to the specified directory and allow you to load it back when needed.
from llama_index.legacy.graph_stores.simple import SimpleGraphStore
# load
graph_store = SimpleGraphStore.from_persist_path(persist_path="<persist_path>")
This will load the SimpleGraphStore
data from the specified path.
Additionally, you can use constants from the llama_index.core.graph_stores.types
module to standardize the persist directory and file names:
from llama_index.core.graph_stores.types import DEFAULT_PERSIST_DIR, DEFAULT_PERSIST_FNAME
# Example usage
graph_store.persist(persist_path=DEFAULT_PERSIST_DIR + "/" + DEFAULT_PERSIST_FNAME)
This ensures that you are using the default directory and file name conventions provided by the LlamaIndex framework [1][2][3][4][5].
To continue talking to Dosu, mention @dosu.
can I use the updated template directly in kg_index.from_documents method
also how to print out the actual prompt used generating the response, with the actual values of the variables in the prompt template
Yes, you can use the updated template directly in the kg_index.from_documents
method. To print out the actual prompt used for generating the response with the actual values of the variables in the prompt template, you can use the format
method of the PromptTemplate
class.
Here is an example:
from llama_index.legacy.prompts.base import PromptTemplate
from llama_index.legacy.prompts.prompt_type import PromptType
from llama_index.legacy.indices.knowledge_graph.base import KnowledgeGraphIndex
from llama_index.legacy.schema import TextNode
# Define your custom prompt template
CUSTOM_KG_TRIPLET_EXTRACT_TMPL = (
"Extract up to {max_knowledge_triplets} knowledge triplets from the given text. "
"Each triplet should be in the form of (head, relation, tail) with their respective types.\n"
"---------------------\n"
"INITIAL ONTOLOGY:\n"
"Entity Types: {allowed_entity_types}\n"
"Relation Types: {allowed_relation_types}\n"
"\n"
"Use these types as a starting point, but introduce new types if necessary based on the context.\n"
"\n"
"GUIDELINES:\n"
"- Output in JSON format: [{{'head': '', 'head_type': '', 'relation': '', 'tail': '', 'tail_type': ''}}]\n"
"- Use the most complete form for entities (e.g., 'United States of America' instead of 'USA')\n"
"- Keep entities concise (3-5 words max)\n"
"- Break down complex phrases into multiple triplets\n"
"- Ensure the knowledge graph is coherent and easily understandable\n"
"---------------------\n"
"EXAMPLE:\n"
"Text: Tim Cook, CEO of Apple Inc., announced the new Apple Watch that monitors heart health. "
"UC Berkeley researchers studied the benefits of apples.\n"
"Output:\n"
"[{{'head': 'Tim Cook', 'head_type': 'PERSON', 'relation': 'CEO_OF', 'tail': 'Apple Inc.', 'tail_type': 'COMPANY'}},\n"
" {{'head': 'Apple Inc.', 'head_type': 'COMPANY', 'relation': 'PRODUCES', 'tail': 'Apple Watch', 'tail_type': 'PRODUCT'}},\n"
" {{'head': 'Apple Watch', 'head_type': 'PRODUCT', 'relation': 'MONITORS', 'tail': 'heart health', 'tail_type': 'HEALTH_METRIC'}},\n"
" {{'head': 'UC Berkeley', 'head_type': 'UNIVERSITY', 'relation': 'STUDIES', 'tail': 'benefits of apples', 'tail_type': 'RESEARCH_TOPIC'}}]\n"
"---------------------\n"
"Text: {text}\n"
"Output:\n"
)
CUSTOM_KG_TRIPLET_EXTRACT_PROMPT = PromptTemplate(
CUSTOM_KG_TRIPLET_EXTRACT_TMPL,
prompt_type=PromptType.KNOWLEDGE_TRIPLET_EXTRACT
)
# Define allowed entity types and relation types
allowed_entity_types = ["PERSON", "COMPANY", "PRODUCT", "UNIVERSITY", "HEALTH_METRIC", "RESEARCH_TOPIC"]
allowed_relation_types = ["CEO_OF", "PRODUCES", "MONITORS", "STUDIES"]
# Format the prompt with actual values
formatted_prompt = CUSTOM_KG_TRIPLET_EXTRACT_PROMPT.format(
allowed_entity_types=allowed_entity_types,
allowed_relation_types=allowed_relation_types,
max_knowledge_triplets=10,
text="Tim Cook, CEO of Apple Inc., announced the new Apple Watch that monitors heart health."
)
# Print the formatted prompt
print(formatted_prompt)
# Define your documents
documents = [
"Tim Cook, CEO of Apple Inc., announced the new Apple Watch that monitors heart health.",
"UC Berkeley researchers studied the benefits of apples."
]
# Create TextNode objects from documents
nodes = [TextNode(text=doc) for doc in documents]
# Create the KnowledgeGraphIndex
kg_index = KnowledgeGraphIndex(
nodes=nodes,
kg_triplet_extract_template=CUSTOM_KG_TRIPLET_EXTRACT_PROMPT.partial_format(
allowed_entity_types=allowed_entity_types,
allowed_relation_types=allowed_relation_types,
max_knowledge_triplets=10
)
)
# Alternatively, you can use the from_documents method
kg_index = KnowledgeGraphIndex.from_documents(
documents,
kg_triplet_extract_template=CUSTOM_KG_TRIPLET_EXTRACT_PROMPT.partial_format(
allowed_entity_types=allowed_entity_types,
allowed_relation_types=allowed_relation_types,
max_knowledge_triplets=10
)
)
This code demonstrates how to use the updated template directly in the kg_index.from_documents
method and how to print out the actual prompt used for generating the response with the actual values of the variables in the prompt template [1][2][3].
To continue talking to Dosu, mention @dosu.
Question Validation
Question
I want to figure out how to update the default prompt of KnowledgeGraphIndex, so that I can set constraints on the entities and relations it extracts.