replikativ / datahike

A fast, immutable, distributed & compositional Datalog engine for everyone.
https://datahike.io
Eclipse Public License 1.0
1.62k stars 95 forks source link

feat: improve writer latency #618

Closed whilo closed 9 months ago

whilo commented 1 year ago

Fixes #617. This pull request changes the operations of the write process to flush and sync the dirty indices in a two stage process instead of waiting on all operations during the execution:

  1. Flush all index trees and collect asynchronous operations on dirty nodes without waiting. Then wait on collected operations until all index data is written. (This ensures that no pointers used by the DB record in 2. will be dangling.)
  2. Write DB record in parallel into commit log and into branch value.

This approach reduces the transaction latency in the best case to two round trips to the underlying store, which is optimal if distributed snapshot consistency needs to be preserved. Otherwise other processes could read DB records that point to tree fragments that are not yet written.

whilo commented 9 months ago

@TimoKramer This PR is finally ready now. The necessary flush statement of the last commit was missing and caused errors on machines with slow filesystems which then made the async IO hang. All tests pass now always on all machines I have access to. The problem was not clearly visible because assertion errors are not propagated and koache swallows all log output by default. The following konserve PR renders such read errors visible and does not use assertions anymore https://github.com/replikativ/konserve/pull/115.