Buffer manager exception: Failed to claim a frame: Python API: Bulk Load kuzu==0.0.12 v kuzu==0.2.0

saschamcdonald commented 8 months ago

Python API:

Executed on a Mac local client: 2.6 GHz 6-Core Intel Core i7. AMD Radeon Pro 5300M 4 GB. Intel UHD Graphics 630 1536 MB. Memory 32 GB 2667 MHz DDR4

Scenario: A parquet snappy compressed file to bulk load a relationship table, file size: 1.97 GB from disk: Version of Kuzu: kuzu==0.0.12 Result: Loads successfully

Version of Kuzu: kuzu==0.2.0 Result: ERROR: Buffer manager exception: Failed to claim a frame.

ray6080 commented 8 months ago

Hi @saschamcdonald , thanks for reporting this! We've made significant changes to the rel storage since v0.1.0, and that should be why the behaviour is inconsistent between these two versions, though the exception is not expected. We'd love to look into the exception and fix it. Is it possible for you to share the parquet file with us?

saschamcdonald commented 8 months ago

re share the parquet file with us?: Unfortunately I have a GDPR issue as the data could be personally identifiable. From review of the changes between versions of KuzuDB as referenced in the raised issue, it could be that ingress of a large parquet file ( relative to a client's available memory ) into kuzu v 0.2.0, is potentially not chunking ingress in terms of memory spill to disc as effectively as kuzu v 0.0.12 via it's default settings. I'll try and create test data over the next couple of weeks to offer a repeatable test data set for the team. I think memory to disc spill is potentially the issues. In the interim is there a debug level I can set to capture more info for the team? - brb.

ray6080 commented 8 months ago

Hi @saschamcdonald , thanks for the info. Is it possible to share the rel table schema and some statistical information of the dataset, like number of nodes and number of rels and also some degree distribution info? We can try to reproduce this locally on our side.

I'll try and create test data over the next couple of weeks to offer a repeatable test data set for the team.

That would be much appreciated!

In the interim is there a debug level I can set to capture more info for the team?

Unfortunately, we currently don't have a way to collect more debugging info without compiling from source and running inside a debugger.

saschamcdonald commented 8 months ago

@ray6080 Here is a small repo containing my code that generates test data and offers environment repeatability for the issue and hopefully useful for the team: https://github.com/saschamcdonald/ch_06_kuzudb_tests

ray6080 commented 8 months ago

hi @saschamcdonald thanks for sharing this. will take a look into it.

saschamcdonald commented 8 months ago

Testing update:

I tested loading the 45,000,000 relationship parquet table ( snappy compressed ) using the Kuzu CLI -version: /v0.2.1/kuzu_cli-osx-universal.tar.gz.

Result: Error: Buffer manager exception: Failed to claim a frame.

Conclusion: It seems the issue is not specific to the Python API as manifests using the CLI.

Note: If you need assistance or if I can help in anyway please let me know. If of value I could extend the test generator and loading repo to meet a requirement if needed:

semihsalihoglu-uw commented 8 months ago

@saschamcdonald: This error means that the buffer manager is running out of memory. A couple of questions:

~~How much RAM do you have on the machine?~~ (I saw you have 32GB). You might need to try to on a larger machine.
How many nodes do your node tables have?
If you can share the database somehow we can test to see how much memory we require for this now. If the test databases was generated from your generator, let us know how we can generate a similar database.

saschamcdonald commented 8 months ago

The size of my box supports loading kuzu 0.011. However, versions 2 and above fail to load the relationship table. Ideally loading should be optimised rather than a hardware solution. It keeps Kuzu efficient.
Two node tables: Company and Person
One relationship table WorksAt
All data in tables are strings
The person_id is primary key for Person nodes and contains 5 properties
The company_id is the primary key for Company nodes and contains 5 properties
The relationship table is FROM person_id TO company_id and contains 5 properties
The generator repo provided builds the data and loads a kuzu dtabase and contains instructions - let me know if you need help running. I could offer a virtual meeting and walk though it if you like?
In the interim, please could you send an email to sascha@datacue.ai and I'll respond with a link to the test data and the database.

prrao87 commented 8 months ago

Hi @saschamcdonald we'll take a look at the artificial dataset repo that you created, it shouldn't be hard to reproduce. Will get back to you on this.

saschamcdonald commented 8 months ago

@prrao87 The test data and kuzu loading is in this repo and I've just updated. It now contains a third option to create an environment and a kuzu database based on the latest version of kuzu using the python api and dynamically names the databases based on version. Splits are added and defaults to 1. The splits offers the ability to create and load multiple parquet files for a given table - I'll add that as a config setting in the .env later. Let me know if you need assistance and thanks for looking at this issue as its a show stopper for us at present.

prrao87 commented 8 months ago

Got it, will keep you posted.

prrao87 commented 8 months ago

@saschamcdonald We've narrowed it down to the partitioning changes introduced in 0.1.0, explaining why it worked up to Kùzu 0.0.12. We'll need to run some profiling to further isolate the issue.

For now, since this is a showstopper, could you run your workflow with the latest release that works (0.0.12)? I was able to successfully load it up until that version. We'll look into this more and update you when we have a fix. Thanks!

saschamcdonald commented 8 months ago

@prrao87 :

Question: .Double check the version please?

version tested: version 0.0.12

Result: PASSED.

Details (terminal output):


2024-02-20 18:18:07,549 - INFO - Kuzu loaded time counts:
 +-------------------------------------+
| Load Times for test_kuzu_db_v0_0_12 |
+-------------+-----------------------+
|  Table Name |  Load Time (Seconds)  |
+-------------+-----------------------+
|    Person   |         10.08         |
|   Company   |          2.24         |
|   WorksAt   |         181.52        |
+-------------+-----------------------+
2024-02-20 18:18:15,584 - INFO - Kuzu loaded query counts:
 +-------------------------------------------+
| Database Summary for test_kuzu_db_v0_0_12 |
+---------+------------+--------------------+
|  Entity | Node Count | Relationship Count |
+---------+------------+--------------------+
| Company | 5,000,000  |         -          |
|  Person | 20,000,000 |         -          |
|    -    |     -      |    100,000,000     |
+---------+------------+--------------------+
Processing completed.
2024-02-20 18:18:19,963 - INFO - Kuzu test data processing completed.
2024-02-20 18:18:19,963 - INFO - Game Over...
Test for KuzuDB version 0.0.12 completed.

Cleaning up...
All tests completed.

prrao87 commented 8 months ago

Yup, we saw the same on our end too. As I mentioned, @ray6080 will get back to you on our next steps for the latest version of Kùzu, but you at least have a version that works.

saschamcdonald commented 8 months ago

@prrao87 Thanks so very much for your help today. I'll update the synthetic repo thingo I built with test for that version.

ray6080 commented 8 months ago

Hi @saschamcdonald, we've got the root cause inside our partitioner. briefly, the problem is due to that we unnecessarily allocated large memory blocks for strings, and that makes BM go out of memory when we have a large num of rel tuples with a few string properties. ~~Will open an issue on this with a solution~~(edit: see https://github.com/kuzudb/kuzu/issues/2957) and let u know once we fixed it.

kuzudb / kuzu

Buffer manager exception: Failed to claim a frame: Python API: Bulk Load kuzu==0.0.12 v kuzu==0.2.0 #2863

Testing update: