kuzudb / kuzu

Embeddable property graph database management system built for query speed and scalability. Implements Cypher.
https://kuzudb.com/
MIT License
1.36k stars 96 forks source link

Transaction API issues #2082

Closed benjaminwinger closed 8 months ago

benjaminwinger commented 1 year ago

When testing the integer bitpacking, I added a test which inserts many nodes at once (NodeInsertionDeletionTests.InsertManyNodesTest). This is rather slow (inserting NODE_GROUP_SIZE values took 75 seconds), so I tried to speed it up by doing it all as one transaction rather than one automatic transaction for each insert, but that resulted in a hang, or at least it was taking much longer than 75 seconds, even for smaller sizes like the 4096 values I ended up using (which still takes a few seconds).

Note that this was reproducible on the master branch prior to my bitpacking changes in #2004.

In addition, inserting values in small chunks would cause a Buffer manager exception: Failed to claim a frame error. This would occur even with just one value (I also tested on chunks of 2, 5 and 10) inserted between calling Connection::beginWriteTransaction and Connection::commit (after many inserts at least, the other test in NodeInsertionDeletionTests proves that it at least works the first few times).

Tests that reproduce both of these issues can be found in this commit.

benjaminwinger commented 1 year ago

It seems like the issue was that beginWriteTransaction was called during initialization so when it's called the second time it silently fails, and then hangs when the inserts are done.

Regarding the silent failure of Connection::beginWriteTransaction, partly it's just that Connection::beginWriteTransaction doesn't actually return a query result, so you can't check that it's successful or see the error message (without querying BEGIN WRITE TRANSACTION manually), but I think it's also part of a broader issue with the C++ API and how it handles results in general. Using [[nodiscard]] would probably help, as well as maybe a QueryResult::assertSuccess method that would turn error messages into exceptions, and in this case I think we would want beginWriteTransaction to just directly call assertSuccess rather than producing a result.

The hang on the other hand is contrary to what the error message says here.

With the double call fixed, doing the inserts all as one transaction works identically to how it's currently working (where I guess it's doing it all as one transaction and auto comitting/discarding when the connection goes out of scope).

Doing 4096 explicit (or implicit) transactions is still failing with the buffer manager exception, but it does work if broken up into chunks so that fewer transactions are done (I tested with chunks of 10, so 410 transactions).

semihsalihoglu-uw commented 9 months ago

I'm linking this issue to this other issue that relates to returning error statuses in the C API: https://github.com/kuzudb/kuzu/issues/2528. These look important to me and should be prioritized as they relate to the usability of our APIs.

ray6080 commented 9 months ago

Just to document our discussion here. The problem is because we didn't propagate query result to transaction related APIs. For example, in the following API, we discarded the return result from query, thus beginReadOnlyTransaction is not able to report errors to users.

void Connection::beginReadOnlyTransaction() {
    query("BEGIN TRANSACTION READ ONLY");
}

The solution is that we should remove following interfaces, beginWriteTransaction , beginReadOnlyTransaction, commit, rollback from cpp APIs, and let users go through equivalent Cypher statements directly.