In the following code snippet, when processing a list of my own documents, there are usually orphan nodes (where each node is a chunk) that get created along with the other nodes. I have a well-defined schema derived from the original example, and the orphan nodes clearly do have entities and relationships that are compliant with the schema, but they are orphaned nonetheless.
I know this because I can process 10 articles, of which 1 will result in an orphaned node. Then if I process only that orphaned node, it will generate schema-based relationships and entities. So the behavior is inconsistent but frequent. In a set of 64 articles, I might get 10 orphaned nodes.
Has anyone else seen this problem or have any suggestions? BTW, I've tried with strict=True and with strict=False.
In the following code snippet, when processing a list of my own documents, there are usually orphan nodes (where each node is a chunk) that get created along with the other nodes. I have a well-defined schema derived from the original example, and the orphan nodes clearly do have entities and relationships that are compliant with the schema, but they are orphaned nonetheless.
I know this because I can process 10 articles, of which 1 will result in an orphaned node. Then if I process only that orphaned node, it will generate schema-based relationships and entities. So the behavior is inconsistent but frequent. In a set of 64 articles, I might get 10 orphaned nodes.
Has anyone else seen this problem or have any suggestions? BTW, I've tried with strict=True and with strict=False.
from llama_index.core import PropertyGraphIndex
kg_extractor = SchemaLLMPathExtractor( llm=llm, possible_entities=entities, possible_relations=relations, kg_validation_schema=validation_schema,
if false, allows for values outside of the schema
)
NUMBER_OF_ARTICLES = 250
index = PropertyGraphIndex.from_documents( documents[:NUMBER_OF_ARTICLES], kg_extractors=[kg_extractor], llm=llm, embed_model=embed_model, property_graph_store=graph_store, show_progress=True, )