kuzudb / kuzu

Embeddable property graph database management system built for query speed and scalability. Implements Cypher.
https://kuzudb.com/
MIT License
1.27k stars 89 forks source link

Bug: Unexpected serialized tinysnb database size #3529

Open acquamarin opened 3 months ago

acquamarin commented 3 months ago

Kùzu version

master

What happened?

The serialized tinysnb database size exceeds 40M which is even bigger than ldbc-sf01 database size tinysnb:

➜  tinysnb git:(https-improvement) ✗ ll
total 81840
-rw-r--r--  1 z473chen  staff   3.4K 20 May 19:49 catalog.kz
-rw-r--r--  1 z473chen  staff    25M 20 May 19:49 data.kz
-rw-r--r--  1 z473chen  staff   5.6M 20 May 19:49 metadata.kz
-rw-r--r--  1 z473chen  staff   3.1M 20 May 19:49 n-0.hindex
-rw-r--r--  1 z473chen  staff   3.0M 20 May 19:49 n-1.hindex
-rw-r--r--  1 z473chen  staff   3.0M 20 May 19:49 n-2.hindex
-rw-r--r--  1 z473chen  staff    12K 20 May 19:49 n-2.hindex.ovf
-rw-r--r--  1 z473chen  staff   1.8K 20 May 19:49 nodes.statistics_and_deleted.ids
-rw-r--r--  1 z473chen  staff   3.3K 20 May 19:49 rels.statistics
-rw-r--r--  1 z473chen  staff     8B 20 May 19:49 version.txt

ldbc-01:

➜  ldbc01 git:(https-improvement) ✗ ll
total 49568
-rw-r--r--  1 z473chen  staff   332B 22 May 12:56 catalog.kz
-rw-r--r--  1 z473chen  staff    14M 22 May 12:56 data.kz
-rw-r--r--  1 z473chen  staff   216K 22 May 12:56 metadata.kz
-rw-r--r--  1 z473chen  staff    10M 22 May 12:56 n-0.hindex
-rw-r--r--  1 z473chen  staff   279B 22 May 12:56 nodes.statistics_and_deleted.ids
-rw-r--r--  1 z473chen  staff     8B 22 May 12:56 rels.statistics

Are there known steps to reproduce?

The serialized database is already included in our repo under dataset/databases/tinysnb

benjaminwinger commented 3 months ago

I think this is mostly coming from the uncompressed types (INT128, UUID, DOUBLE, FLOAT, INTERVAL), none of which are present in the LDBC-SF01 dataset.