tomasonjo / blogs

Jupyter notebooks that support my graph data science blog posts at https://bratanic-tomaz.medium.com/
973 stars 288 forks source link

Inconsistent results when creating PropertyGraphIndex (orphaned nodes) #29

Closed dkatz123 closed 2 days ago

dkatz123 commented 5 days ago

In the following code snippet, when processing a list of my own documents, there are usually orphan nodes (where each node is a chunk) that get created along with the other nodes. I have a well-defined schema derived from the original example, and the orphan nodes clearly do have entities and relationships that are compliant with the schema, but they are orphaned nonetheless.

I know this because I can process 10 articles, of which 1 will result in an orphaned node. Then if I process only that orphaned node, it will generate schema-based relationships and entities. So the behavior is inconsistent but frequent. In a set of 64 articles, I might get 10 orphaned nodes.

Has anyone else seen this problem or have any suggestions? BTW, I've tried with strict=True and with strict=False.

from llama_index.core import PropertyGraphIndex

kg_extractor = SchemaLLMPathExtractor( llm=llm, possible_entities=entities, possible_relations=relations, kg_validation_schema=validation_schema,

if false, allows for values outside of the schema

# useful for using the schema as a suggestion
strict=True,

)

NUMBER_OF_ARTICLES = 250

index = PropertyGraphIndex.from_documents( documents[:NUMBER_OF_ARTICLES], kg_extractors=[kg_extractor], llm=llm, embed_model=embed_model, property_graph_store=graph_store, show_progress=True, )

dkatz123 commented 2 days ago

I think this may be related to the LLM temperature setting. Setting a higher value seems to have resolved the issue.