kuzudb / kuzu

Embeddable property graph database management system built for query speed and scalability. Implements Cypher.
https://kuzudb.com/
MIT License
1.39k stars 99 forks source link

Bug: Query Performance Discrepancy: KuzuDB CLI vs. Rust Code #4247

Open biplab5464 opened 1 month ago

biplab5464 commented 1 month ago

Kùzu version

rust - kuzu 0.6.0 kuzu_cli - Kuzu 0.6.0

What operating system are you using?

mac os somona (also tasted in winodws 11 23h2 with same result)

What happened?

We encountered a performance issue when testing a large query. Initially, the query took around 3.5 seconds to execute in our Rust code, which raised concerns about performance.

To investigate, we ran the same query in theKuzuDB CLI, where it surprisingly took only 252 ms on average. Suspecting a problem in our code, we simplified the test by writing a basic program that only runs the query and measures the time. In this case, the query took 3.46 seconds.

Screenshot 2024-09-12 at 4 46 42 PM Screenshot 2024-09-12 at 4 41 30 PM

Thinking the problem might be related to unoptimized code, we compiled the Rust code in release mode. While the execution time dropped significantly, but it was still 368 ms—more than twice the time taken by the CLI.

Screenshot 2024-09-12 at 4 59 39 PM

We are looking for insights into why there’s such a large performance gap between the CLI and the Rust implementation, and how we can optimize the Rust execution further.

Are there known steps to reproduce?

  1. Run a sample query using Rust in embedded mode and note the execution time.
  2. Run the same query in the KuzuDB CLI and compare the execution time.
benjaminwinger commented 1 month ago

I have not been able to reproduce any significant difference in runtime between the CLI and the Rust API (when both are built in release mode), however there does seem to be a noticeable discrepancy between what the query result reports as its compilation/ execution time and the total time spent running the query. I think the solution is probably to add more timers to cover the parts of which currently aren't included.

That said, I haven't tried this on very many queries, and I only noticed a significant difference on small queries, mostly because what I'm observing appears to be a small amount of overhead that's not really scaling with the query time (e.g. for a simple query where the compilation+execution time was reported as only ~3ms the total time was ~20ms, but on a comparably simple query with a compiliation+execution time of ~550ms the total time was ~560ms). To double-check that this is the case for your workload, could you try comparing the results of QueryResult::get_compiling_time and get_executing_time to the results from the CLI (or better, try comparing with the total query time using the C++ API)? I suspect that the issue is just that this particular query is spending some extra time doing work when the timers aren't running.

biplab5464 commented 1 month ago

Thank you letting us know about this functions. You are correct their is no discrepancy with compile time or execution time

here is the result

image [Time taken in rust sample code]

image [Time taken in kuzu cli]

Execution time + compile time (in kuzu cli) = 373.36

we can see this nearly 250ms time overhead (Total time - (Execution time + compile time)) leaving compile time and execution time every time we run the same query mentioned above

Can we get more insight on why their is overhead and anyway if wecan reduced it

biplab5464 commented 1 month ago

Please refer to the image below, where we attempted to find the same node using two different queries: a larger query (path finder) and one using the UUID (primary key).

image

In the second case (using UUID), the overhead is minimal, just around 4 ms, and both queries return the same node.

We would like to understand where the time is being spent in the first case (beyond execution and compilation time), as the difference is significant.

prrao87 commented 1 month ago

Hi @biplab5464, we may need some more information on the query you're running.

we can see this nearly 250ms time overhead (Total time - (Execution time + compile time)) leaving compile time and execution time every time we run the same query mentioned above Can we get more insight on why their is overhead and anyway if wecan reduced it

If you could paste the query itself, removing any sensitive parts and replacing with placeholders, that could help us understand better what the query is doing and suggest some ideas.

In the second case (using UUID), the overhead is minimal, just around 4 ms, and both queries return the same node.

for this one, shouldn't this be expected? We construct a hash index on the primary key column of every table, so query performance is bound to be faster than if you were trying to lookup properties on a node that didn't have a hash index. There would be multiple additional intermediate steps in the query plan generated in the latter case.

biplab5464 commented 1 month ago

Here is a replica query, so the db is like Tree data structure, we have 3 schemas Root, Node and Leaf and 1 Group relationships called 'CONTAINS' that are FROM Root To Node , FROM Node TO Node and FROM Node To Leaf

MATCH (n:Root)-[:CONTAINS]->({name:'Groups'})
        -[:CONTAINS]->(Node:{name:'Group 1'})
        -[:CONTAINS]->(:Node{name:'jack'})
        -[:CONTAINS]->(:Node{name:'smith'})
        -[:CONTAINS]->(:Node{name:'denath'})
        -[:CONTAINS]->(:Node{name:'child'})
        -[:CONTAINS]->(:Node{name:'child of child'})
        -[:CONTAINS]->(:Node{name:'grandchild'})
        -[:CONTAINS]->(:Node{name:'1'})
        -[:CONTAINS]->(:Node{name:'2'})
        -[:CONTAINS]->(:Node{name:'3'})
        -[:CONTAINS]->(:Node{name:'4'})
        -[:CONTAINS]->(:Node{name:'5'})
        -[:CONTAINS]->(:Node{name:'6'})
        -[:CONTAINS]->(:Node{name:'7'})
        -[:CONTAINS]->(:Node{name:'8'})
        -[:CONTAINS]->(x {name:'leaf'})

for this one, shouldn't this be expected? We construct a hash index on the primary key column of every table, so query performance is bound to be faster than if you were trying to lookup properties on a node that didn't have a hash index. There would be multiple additional intermediate steps in the query plan generated in the latter case.

Here we are comparing the overhead Time (Total time - (Execution time + compile time)). This logic set true for (Execution time + compile time) which happens during compile and execution of the query but we are talking the extra time that is happening when we get the Same Node but from different query, that is path finder : 257.712 ms and from uuid : 3.917 sec ( Please refer to picture in my previous comment ). Please let me know me if i am wrong.

ajaykumargdr commented 1 month ago

Just for curiosity we executed the same query using python bindings, which took literally same amount of time as the kuzu cli takes. {E590FC5E-F9CA-4ECB-A68A-13EB0A450D0A} [we executed the same query as @biplab5464 ]

{2792C9F4-1134-417E-9DE0-8442420020F8} [same query executed in kuzu cli]

prrao87 commented 1 month ago

Hi @biplab5464 amd @ajaykumargdr, just wanted to confirm, are you both using Windows (for Ajay's case it's clear based on the terminal screenshots). That most recent screenshot showing the Python timing might point to something specific to the Rust API, but we'll need to investigate a bit further. Thanks!

biplab5464 commented 1 month ago

just wanted to confirm, are you both using Windows (for Ajay's case it's clear based on the terminal screenshots).

We have both Windows and macOS, and the issue is the same on both platforms In a single comment, we are either using Windows or macOS, not mixing both platforms.