Open Get-Rect opened 1 month ago
Hey there, @Get-Rect! I'm here to help you with any bugs, questions, or contributions you have. Let's squash those bugs together!
The issue you're encountering with the HuggingFaceInferenceAPI LLM
failing to extract entities and relations when using the SchemaLLMPathExtractor
is likely due to the strict validation enforced by the SchemaLLMPathExtractor
. This extractor requires entities and relations to conform to a predefined schema, which might not be the case with the data being processed. In contrast, the SimpleLLMPathExtractor
uses a simpler approach without strict validation, allowing it to work more flexibly [1][2].
To address this issue, you can try the following steps:
Adjust the Schema Validation: Modify the SchemaLLMPathExtractor
to be less strict by setting the strict
parameter to False
. This can help bypass the schema validation and allow the extraction to proceed.
from llama_index.core.indices.property_graph.transformations.schema_llm import SchemaLLMPathExtractor
extractor = SchemaLLMPathExtractor(
llm=your_llm_instance,
extract_prompt=your_custom_prompt,
possible_entities=["Entity1", "Entity2"],
possible_relations=["Relation1", "Relation2"],
strict=False, # Disable strict validation
max_triplets_per_chunk=5,
num_workers=2
)
Custom Prompts and Parsing Functions: Customize the prompts and parsing functions to better suit your data and extraction needs. This can help improve the performance and accuracy of the extraction process.
from llama_index.core.prompts import PromptTemplate
from llama_index.core.indices.property_graph.utils import custom_parse_triplets_fn
custom_prompt = PromptTemplate("Your custom prompt here")
custom_parse_fn = custom_parse_triplets_fn
extractor = SchemaLLMPathExtractor(
llm=your_llm_instance,
extract_prompt=custom_prompt,
parse_fn=custom_parse_fn,
possible_entities=["Entity1", "Entity2"],
possible_relations=["Relation1", "Relation2"],
strict=False,
max_triplets_per_chunk=5,
num_workers=2
)
Logging and Debugging: Enable detailed logging and use the callback handler to trace and debug the extraction process. This can help identify specific issues and areas for improvement.
import logging
import sys
logging.basicConfig(stream=sys.stdout, level=logging.DEBUG)
logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))
import llama_index.core
llama_index.core.set_global_handler("simple")
By making these adjustments, you should be able to improve the extraction performance and potentially resolve the issue with the HuggingFaceInferenceAPI LLM
and SchemaLLMPathExtractor
[3][4].
To continue talking to Dosu, mention @dosu.
@Get-Rect the model you are using is probably just not good enough to output consistent pydantic objects
I recommend not using this module for open-source LLMs, unless its like llama3.1 70b
Ollama supports tool calling recently. Highly recommended
@logan-markewich Thank you for the advice. I have tried it with much bigger models like llama3.1-405b and had the same empty results. I have even set Strict=False as Dosu recommended. What's weird is that I'm only running llama3 7b locally with Ollama and getting much better results than any model when I use the inference api.
I will keep trying custom prompts and parsing functions and check the logs as Dosu recommended. Any other tips you can offer would be greatly appreciated.
Bug Description
When using a HuggingFaceInferenceAPI LLM to build a property graph using the SchemaLLMPathExtractor, no entities or relations are extracted.
I have tried testing this with different models, extract prompts, and settings and have not been able to get it to work, despite the HuggingFaceInferenceAPI LLM working when using a SimpleLLMPathExtractor.
Version
0.10.52
Steps to Reproduce
Clone the example repo here and run it, https://github.com/Get-Rect/property_graph_schema_hugging_face
Or, create a Property Graph Store using a HuggingFaceInferenceAPI LLM and a SchemaLLMPathExtractor. I have only tested this with a connection to a locally run Neo4J server, but it may persist for other data stores.
Relevant Logs/Tracbacks
No response