In the following code snippet, when processing a list of my own documents, there are usually orphan nodes (where each node is a chunk) that get created along with the other nodes. I have a well-defined schema derived from the original example, and the orphan nodes clearly do have entities and relationships that are compliant with the schema, but they are orphaned nonetheless.

I know this because I can process 10 articles, of which 1 will result in an orphaned node. Then if I process only that orphaned node, it will generate schema-based relationships and entities. So the behavior is inconsistent but frequent. In a set of 64 articles, I might get 10 orphaned nodes.

Has anyone else seen this problem or have any suggestions? BTW, I've tried with strict=True and with strict=False.

from llama_index.core import PropertyGraphIndex

kg_extractor = SchemaLLMPathExtractor( llm=llm, possible_entities=entities, possible_relations=relations, kg_validation_schema=validation_schema,

if false, allows for values outside of the schema

# useful for using the schema as a suggestion
strict=True,

)

NUMBER_OF_ARTICLES = 250

index = PropertyGraphIndex.from_documents( documents[:NUMBER_OF_ARTICLES], kg_extractors=[kg_extractor], llm=llm, embed_model=embed_model, property_graph_store=graph_store, show_progress=True, )

tomasonjo / blogs

Inconsistent results when creating PropertyGraphIndex (orphaned nodes) #29

if false, allows for values outside of the schema