mitdbg / treeline

An update-in-place key-value store for modern storage.
MIT License
132 stars 18 forks source link

Page grouping write path fixes and improvements #42

Closed geoffxy closed 2 years ago

geoffxy commented 2 years ago

This PR fixes the bugs that caused the scan count discrepancy I mentioned. There were two problems:

  1. The incremental insert workload was accidentally executed before we shuffled the pages. The inserts create overflows and we cannot shuffle their locations without also updating the page "pointers" in the main segments.
  2. I was setting the page boundaries within a segment by using the records available during the initial bulk load. But you can only do this for the segment boundaries. For the page boundaries, you need to compute the boundaries induced by the model. In other words, the smallest possible key that can be assigned to a page is not necessarily the same as the smallest key in the bulk load that happens to be assigned to that page.

This PR also adds pg_check, which is a "fsck-like" tool that will check the physical consistency of the DB files (e.g., checking page boundaries and checking for dangling overflow pages). I also made some minor improvements to the code: correcting some comments and setting the upper boundaries of the on-disk pages more accurately.

cc @andreaskipf @mmarkakis