Open hypernovas opened 2 days ago
The long initialization time for Neo4jPropertyGraphStore
with larger graphs is likely due to the refresh_schema()
method not handling large datasets efficiently and not using asynchronous operations. The refresh_schema()
method performs several structured queries to gather schema information, which can be time-consuming for large datasets.
The Neo4jPropertyGraphStore
class does have an asynchronous driver (_async_driver
) and several asynchronous methods that could be leveraged to improve the performance of refresh_schema()
. Implementing asynchronous queries could potentially reduce the initialization time by allowing multiple queries to be executed concurrently [1][2].
Additionally, the NebulaPropertyGraphStore
class from the LlamaIndex framework provides an example of handling schema refresh and initialization more efficiently, which could be adapted for Neo4jPropertyGraphStore
[3].
Consider modifying the refresh_schema()
method to use asynchronous queries to improve performance.
To continue talking to Dosu, mention @dosu.
Bug Description
It takes about 14 min to initiate the graph store with 3558 entities. I feel this is because
refresh_schema()
does not handle large graphs well. Maybe not using async?I pasted the log I ingested into the lib code. This is taking way too long if for things beyond experiments. Wondering if you guys could help on this?
Version
llama-index-graph-stores-neo4j==0.3.2, llama-index==0.11.11
Steps to Reproduce
Relevant Logs/Tracbacks
No response