Open ferdeleong opened 1 month ago
Hello @ferdeleong! I'm here to help you with your coding questions and bug fixes. Let's work together to solve the issue you're facing.
To use extract_prompt
with DynamicLLMPathExtractor
, you need to ensure that the prompt is correctly formatted and compatible with the extractor. Here is an example of how to properly set up and use DynamicLLMPathExtractor
with extract_prompt
:
from typing import Literal
from llama_index.core.indices.property_graph import DynamicLLMPathExtractor
from llama_index.core.prompts import PromptTemplate
from llama_index.core.llms.llm import LLM
# Define your language model
llm = LLM()
# Define your extraction prompt
extract_prompt = PromptTemplate(
"Extract the relationships between entities in the following text according to the provided schema: {text}"
)
# Define possible entities and relations
entities = Literal["PERSON", "PLACE", "THING"]
relations = Literal["PART_OF", "HAS", "IS_A"]
schema = {
"PERSON": ["PART_OF", "HAS", "IS_A"],
"PLACE": ["PART_OF", "HAS"],
"THING": ["IS_A"],
}
# Initialize the DynamicLLMPathExtractor
kg_extractor = DynamicLLMPathExtractor(
llm=llm,
extract_prompt=extract_prompt,
possible_entities=entities,
possible_relations=relations,
kg_validation_schema=schema,
strict=True,
num_workers=4,
max_paths_per_chunk=10,
show_progress=False,
)
# Use the extractor to extract relationships from your data
# Assuming `data` is the text data you want to process
relationships = kg_extractor.extract(data)
In this example:
llm
is your language model instance.extract_prompt
is a PromptTemplate
that defines how the extraction query should be formatted.possible_entities
and possible_relations
define the entities and relations that are allowed in the schema.kg_validation_schema
is the schema to validate the extracted paths.strict
enforces strict schema validation.Make sure that your extract_prompt
provides sufficient context for the LLM to establish relationships between nodes. If you encounter issues, ensure that the prompt is correctly formatted and that the schema is properly defined.
For more details on the differences between SimpleLLMPathExtractor
and DynamicLLMPathExtractor
, you can refer to the documentation [1].
To continue talking to Dosu, mention @dosu.
@ferdeleong you can customize the prompt like so (this is the default)
extract_prompt = (
"Extract up to {max_knowledge_triplets} knowledge triplets from the given text. "
"Each triplet should be in the form of (head, relation, tail) with their respective types.\n"
"---------------------\n"
"INITIAL ONTOLOGY:\n"
"Entity Types: {allowed_entity_types}\n"
"Relation Types: {allowed_relation_types}\n"
"\n"
"Use these types as a starting point, but introduce new types if necessary based on the context.\n"
"\n"
"GUIDELINES:\n"
"- Output in JSON format: [{{'head': '', 'head_type': '', 'relation': '', 'tail': '', 'tail_type': ''}}]\n"
"- Use the most complete form for entities (e.g., 'United States of America' instead of 'USA')\n"
"- Keep entities concise (3-5 words max)\n"
"- Break down complex phrases into multiple triplets\n"
"- Ensure the knowledge graph is coherent and easily understandable\n"
"---------------------\n"
"EXAMPLE:\n"
"Text: Tim Cook, CEO of Apple Inc., announced the new Apple Watch that monitors heart health. "
"UC Berkeley researchers studied the benefits of apples.\n"
"Output:\n"
"[{{'head': 'Tim Cook', 'head_type': 'PERSON', 'relation': 'CEO_OF', 'tail': 'Apple Inc.', 'tail_type': 'COMPANY'}},\n"
" {{'head': 'Apple Inc.', 'head_type': 'COMPANY', 'relation': 'PRODUCES', 'tail': 'Apple Watch', 'tail_type': 'PRODUCT'}},\n"
" {{'head': 'Apple Watch', 'head_type': 'PRODUCT', 'relation': 'MONITORS', 'tail': 'heart health', 'tail_type': 'HEALTH_METRIC'}},\n"
" {{'head': 'UC Berkeley', 'head_type': 'UNIVERSITY', 'relation': 'STUDIES', 'tail': 'benefits of apples', 'tail_type': 'RESEARCH_TOPIC'}}]\n"
"---------------------\n"
"Text: {text}\n"
"Output:\n"
)
extractor = DynamicLLMPathExtractor(..., extract_prompt=extract_prompt)
Thank you for answering, this is all that I've tried:
def extract_entities_relationships(text):
llm = OpenAI(temperature=0, model="gpt-4o-mini")
document = Document(text=text, metadata={"title": "Amenities Content"})
kg_extractor = DynamicLLMPathExtractor(
llm=llm,
max_triplets_per_chunk=4,
num_workers=16,
allowed_entity_types=None,
allowed_relation_types=None,
allowed_relation_props=[],
allowed_entity_props=[],
)
index = PropertyGraphIndex.from_documents(
[document],
llm=llm,
embed_kg_nodes=False,
kg_extractors=[kg_extractor],
show_progress=True,
)
And that works perfectly fine
Then I introduce the following:
extract_prompt = (
"Extract up to {max_knowledge_triplets} knowledge triplets from the given text. "
"Each triplet should be in the form of (head, relation, tail) with their respective types.\n"
"---------------------\n"
"INITIAL ONTOLOGY:\n"
"Entity Types: {allowed_entity_types}\n"
"Relation Types: {allowed_relation_types}\n"
"\n"
"Use these types as a starting point, but introduce new types if necessary based on the context.\n"
"\n"
"GUIDELINES:\n"
"- Output in JSON format: [{{'head': '', 'head_type': '', 'relation': '', 'tail': '', 'tail_type': ''}}]\n"
"- Use the most complete form for entities (e.g., 'United States of America' instead of 'USA')\n"
"- Keep entities concise (3-5 words max)\n"
"- Break down complex phrases into multiple triplets\n"
"- Ensure the knowledge graph is coherent and easily understandable\n"
"---------------------\n"
"EXAMPLE:\n"
"Text: Tim Cook, CEO of Apple Inc., announced the new Apple Watch that monitors heart health. "
"UC Berkeley researchers studied the benefits of apples.\n"
"Output:\n"
"[{{'head': 'Tim Cook', 'head_type': 'PERSON', 'relation': 'CEO_OF', 'tail': 'Apple Inc.', 'tail_type': 'COMPANY'}},\n"
" {{'head': 'Apple Inc.', 'head_type': 'COMPANY', 'relation': 'PRODUCES', 'tail': 'Apple Watch', 'tail_type': 'PRODUCT'}},\n"
" {{'head': 'Apple Watch', 'head_type': 'PRODUCT', 'relation': 'MONITORS', 'tail': 'heart health', 'tail_type': 'HEALTH_METRIC'}},\n"
" {{'head': 'UC Berkeley', 'head_type': 'UNIVERSITY', 'relation': 'STUDIES', 'tail': 'benefits of apples', 'tail_type': 'RESEARCH_TOPIC'}}]\n"
"---------------------\n"
"Text: {text}\n"
"Output:\n"
)
def extract_entities_relationships(text):
llm = OpenAI(temperature=0, model="gpt-4o-mini")
document = Document(text=text, metadata={"title": "Amenities Content"})
kg_extractor = DynamicLLMPathExtractor(
llm=llm,
max_triplets_per_chunk=4,
extract_prompt=extract_prompt, # NEW LINE
num_workers=16,
allowed_entity_types=None,
allowed_relation_types=None,
allowed_relation_props=[],
allowed_entity_props=[],
)
index = PropertyGraphIndex.from_documents(
[document],
llm=llm,
embed_kg_nodes=False,
kg_extractors=[kg_extractor],
show_progress=True,
)
And then it stops working
Then I tried using the same prompt but using an entity and relationship instead of None
def extract_entities_relationships(text):
llm = OpenAI(temperature=0, model="gpt-4o-mini")
document = Document(text=text, metadata={"title": "Amenities Content"})
kg_extractor = DynamicLLMPathExtractor(
llm=llm,
max_triplets_per_chunk=4,
extract_prompt=extract_prompt,
num_workers=16,
allowed_entity_types=["RESTAURANT"], # NEW LINE
allowed_relation_types=["SIMILAR_TO"], # NEW LINE
allowed_relation_props=[],
allowed_entity_props=[],
)
index = PropertyGraphIndex.from_documents(
[document],
llm=llm,
embed_kg_nodes=False,
kg_extractors=[kg_extractor],
show_progress=True,
)
Then I tried what the bot suggested
extract_prompt = PromptTemplate( # NEW LINE
"Extract up to {max_knowledge_triplets} knowledge triplets from the given text. "
"Each triplet should be in the form of (head, relation, tail) with their respective types.\n"
"---------------------\n"
"INITIAL ONTOLOGY:\n"
"Entity Types: {allowed_entity_types}\n"
"Relation Types: {allowed_relation_types}\n"
"\n"
"Use these types as a starting point, but introduce new types if necessary based on the context.\n"
"\n"
"GUIDELINES:\n"
"- Output in JSON format: [{{'head': '', 'head_type': '', 'relation': '', 'tail': '', 'tail_type': ''}}]\n"
"- Use the most complete form for entities (e.g., 'United States of America' instead of 'USA')\n"
"- Keep entities concise (3-5 words max)\n"
"- Break down complex phrases into multiple triplets\n"
"- Ensure the knowledge graph is coherent and easily understandable\n"
"---------------------\n"
"EXAMPLE:\n"
"Text: Tim Cook, CEO of Apple Inc., announced the new Apple Watch that monitors heart health. "
"UC Berkeley researchers studied the benefits of apples.\n"
"Output:\n"
"[{{'head': 'Tim Cook', 'head_type': 'PERSON', 'relation': 'CEO_OF', 'tail': 'Apple Inc.', 'tail_type': 'COMPANY'}},\n"
" {{'head': 'Apple Inc.', 'head_type': 'COMPANY', 'relation': 'PRODUCES', 'tail': 'Apple Watch', 'tail_type': 'PRODUCT'}},\n"
" {{'head': 'Apple Watch', 'head_type': 'PRODUCT', 'relation': 'MONITORS', 'tail': 'heart health', 'tail_type': 'HEALTH_METRIC'}},\n"
" {{'head': 'UC Berkeley', 'head_type': 'UNIVERSITY', 'relation': 'STUDIES', 'tail': 'benefits of apples', 'tail_type': 'RESEARCH_TOPIC'}}]\n"
"---------------------\n"
"Text: {text}\n"
"Output:\n"
)
Also no success.
Finally with the same PromptTemplate casted prompt I tried:
max_triplets_per_chunk=20,
formatted_prompt = template.format(max_triplets_per_chunk=max_triplets_per_chunk, text=text)
kg_extractor = DynamicLLMPathExtractor(..., extract_prompt=formatted_prompt)
But again, no success.
Any ideas on what may be happening?
Question Validation
Question
Do you have any examples on how to use
extract_prompt
inDynamicPathExtractor
?I've only found an example of
extract_prompt
usage here inSimpleLLMPathExtractor
: https://docs.llamaindex.ai/en/stable/module_guides/indexing/lpg_index_guide/When I try to use the exact same prompt as shown in the link above with Dynamic I get this result:
If I comment
extract_prompt
my graph works again but I do need to give more context to the LLM to have proper relationships.I also saw the knowledge-graphs in
default_prompts.py
file from the library to custom the prompt so it also follows the allowed entities and relationships defined inkg_extractor
but still no success.I also tried formatting the prompt so
extract_prompt
receives aPromptTemplate
instead of astr
but also no success.