kuzudb / kuzu

Embeddable property graph database management system built for query speed and scalability. Implements Cypher.
https://kuzudb.com/
MIT License
1.28k stars 90 forks source link

Performance issue with LDBC SNB Interactive Queries #3865

Open BingqingLyu opened 2 months ago

BingqingLyu commented 2 months ago

Description

Description:

Hi, I followed the documentation to build the LDBC dataset and performed SNB Interactive Queries. However, the query performance seems slower than expected as shown in this paper. I would like to know if this behavior is expected or if I might be doing something wrong.

Steps to Reproduce:

  1. Install Kuzu

I installed Kuzu by pip install kuzu, and the version is kuzu-0.4.2

  1. Download the official source data:

Downloaded the source data from here.

I used ldbc dataset with sf=10, which consists of about 30 million vertices and 200 million edges.

  1. Setup schema and copy data:

I created ldbc_schema.txt and ldbc_copy.txt to set up the schema and import the data.

  1. Execute IC query ( I executed it with single-thread )

Take a simplified IC-6 as an example:

 MATCH  (p1:PERSON {id: 6597069812321})-[:PERSON_KNOWS_PERSON]-(p2:PERSON)<-[:POST_HASCREATOR_PERSON]-(m:POST)-[:POST_HASTAG_TAG]->(t1:TAG {name:'William_Wordsworth'}), (m)-[:POST_HASTAG_TAG]->(t2:TAG)  WHERE t2.name <> 'William_Wordsworth' RETURN t2.name;

It takes 3-4 seconds.

Questions:

  1. Is this performance expected?
  2. Could I be missing some optimizations or best practices (e.g., did I missing building necessary indexes)?
  3. Are there any specific configurations or settings that might improve query performance?

Environment: A Linux server with 52 cores, 371GB of RAM

Thank you for your assistance.

BingqingLyu commented 2 months ago

Besides,I also profiled this query. From the query plan,

  1. It appears as though there be missing some graph indexes, as after scanning the REL table, it further joins the vertex table. Is this as expected?
  2. It appears that all edges can only be queried in the outbound direction. Should I be constructing edges in both directions for better performance?

The plan can be found in ic6_plan.txt. Thanks a lot.