run-llama / llama_index

LlamaIndex is a data framework for your LLM applications
https://docs.llamaindex.ai
MIT License
37.05k stars 5.32k forks source link

[Bug]: Cannot persist and reload (neo4j) PropertyGraphIndex #16409

Open solarslurpi opened 1 month ago

solarslurpi commented 1 month ago

Bug Description

I assume I should be able to reload a KnowledgeGraph (backended by neo4j). I am not able to. I have tried two methods.

  1. PropertyGraphIndex.from_existing()

    loads the test database but does not fill it with the nodes, etc. Weird to call it from_existing...???

  2. use StorageContext...e.g.:

    
    # save and load
    index.storage_context.persist(persist_dir="./storage")

index = load_index_from_storage( StorageContext.from_defaults(persist_dir="./storage") )


this gives lots-o-utf-8 and other errors.  And perhaps it is weird to store neo4j locally.

### Version

0.11.11

### Steps to Reproduce

see calls above.

### Relevant Logs/Tracbacks

_No response_
dosubot[bot] commented 1 month ago

I found a similar discussion regarding the use of PropertyGraphIndex.from_existing() with a Neo4j backend. The expected behavior of this method is to connect to an existing graph database and allow further operations such as inserting more documents or retrieving information. It initializes the PropertyGraphIndex using the provided property_graph_store, which is connected to the Neo4j database [1].

Regarding the UTF-8 and other encoding errors when using load_index_from_storage, there isn't specific information available in the repository about these errors. However, it might be related to how data is being serialized and deserialized when persisting and reloading the index. It is indeed unusual to store Neo4j data locally, as Neo4j is typically used as a remote graph database.

For persisting and reloading a KnowledgeGraph using Neo4j, ensure that you have configured the Neo4jGraphStore correctly with the appropriate connection details and use it as the storage context for the KnowledgeGraphIndex [2]. If you continue to face issues, reviewing the configuration and ensuring compatibility with the Neo4j version might help.

To continue talking to Dosu, mention @dosu.

logan-markewich commented 1 month ago

If you are using neo4j for both your graph and vector storage (which happens if you specify neo4j as your graph store and do not specify a vector store), then from_existing is all you need -- all the data is in the graph store, and you can use retrieval and querying after running .from_existing(..), kind of confused what the exact issue is

solarslurpi commented 1 month ago

Thank you for your comment. I appreciate it. I have spent too many hours in frustration so I might not be as alert as is respectful to you.

  1. I am setting up the store fine. I am able to see the objects/ nodes/ etc. in the neo4j ui. I am able to use them in a query if I create the nodes each tiem just by filling in PropertyGraphIndex(nodes=nodes).
  2. No matter what I do with the two methods shown in my initial report, i cannot get the node info loaded. from_existing() returns a connection to the store, but the index is empty.
  3. i don't need to use neo4j. I just want this to work. And it does not. I am concerned because I plan to add in a significant number of nodes and i don't want to create the knowledge graph each time.

are you saying from_existing() should load the nodes? Then whey does the code return nodes = []?

btimothy-har commented 1 month ago

Following, I have the same issue here.

I've constructed the neo4j Graph manually, using graph.upsert_nodes(), but they don't get used when constructing a PropertyGraphIndex.

I've noticed that when using upsert_nodes the nodes don't get embedded, but when using index.insert the embeddings are generated by the index.