mitdbg / treeline

An update-in-place key-value store for modern storage.
MIT License
132 stars 17 forks source link

Fix the DB flattening logic #94

Closed geoffxy closed 2 years ago

geoffxy commented 2 years ago

This PR fixes the DB flattening logic (and also the reorg triggering logic). The primary root cause was a subtle problem with operator precedence (see the change in segment_info.h on line 47). There was also a minor issue in segment_index.cc that would cause the flattening logic to miss the first segment.

The good news is that this bug doesn't affect the validity of any on-disk structures, so there should not have been any incorrectness. But we should definitely re-run any insert-heavy experiments

I also updated our experiment scripts to run pg_check after each workload. One downside of doing this is it may lengthen the time it takes to run the experiments, especially the YCSB workloads. What do you think? Perhaps one option is we only run it (i) in preload.sh, and (ii) after any insert heavy experiments

Longer explanation

The segment index stores whether or not the segment contains an overflow; we use the MSB in the segment ID for this purpose. This is a performance optimization so that we can avoid going to disk when it is time to decide which segments to reorganize. We rely on this information for DB flattening as well.

Because of the bug in SegmentInfo::HasOverflow(), the index would erroneously declare that a segment has a overflow when it actually does not, and vice-versa. Essentially, HasOverflow() would be false when the segment's page offset is even, and true when the page offset is odd.

geoffxy commented 2 years ago

I also ran the single-threaded perfect allocation experiments for taxi, and they seem to run to completion now.