[Bug]: Extremely long time initializing Neo4jPropertyGraphStore for larger graphs

run-llama / llama_index

LlamaIndex is a data framework for your LLM applications

MIT License

35.71k stars 5.05k forks source link

2024-09-24 17:22:40.552042 Step 1: Sanitize query output 2024-09-24 17:22:40.552075 Step 2: Enhanced schema 2024-09-24 17:22:40.552079 Step 3: Create driver 2024-09-24 17:22:40.552180 Step 4: Create async driver 2024-09-24 17:22:40.552232 Step 5: Set database 2024-09-24 17:22:40.552236 Step 6: Create structured schema 2024-09-24 17:22:40.552238 Step 7: Create supports vector index 2024-09-24 17:36:02.882747 Step 8: Create index 2024-09-24 17:36:02.888786 Step 9: Verify version

The long initialization time for Neo4jPropertyGraphStore with larger graphs is likely due to the refresh_schema() method not handling large datasets efficiently and not using asynchronous operations. The refresh_schema() method performs several structured queries to gather schema information, which can be time-consuming for large datasets.

The Neo4jPropertyGraphStore class does have an asynchronous driver (_async_driver) and several asynchronous methods that could be leveraged to improve the performance of refresh_schema(). Implementing asynchronous queries could potentially reduce the initialization time by allowing multiple queries to be executed concurrently ^[1]^[2].

Additionally, the NebulaPropertyGraphStore class from the LlamaIndex framework provides an example of handling schema refresh and initialization more efficiently, which could be adapted for Neo4jPropertyGraphStore ^[3].

Consider modifying the refresh_schema() method to use asynchronous queries to improve performance.

To continue talking to Dosu, mention @dosu.

run-llama / llama_index

[Bug]: Extremely long time initializing Neo4jPropertyGraphStore for larger graphs #16204

Bug Description

Version

Steps to Reproduce

Relevant Logs/Tracbacks