This PR fixes the DB flattening logic (and also the reorg triggering logic). The primary root cause was a subtle problem with operator precedence (see the change in segment_info.h on line 47). There was also a minor issue in segment_index.cc that would cause the flattening logic to miss the first segment.
The good news is that this bug doesn't affect the validity of any on-disk structures, so there should not have been any incorrectness. But we should definitely re-run any insert-heavy experiments
I also updated our experiment scripts to run pg_check after each workload. One downside of doing this is it may lengthen the time it takes to run the experiments, especially the YCSB workloads. What do you think? Perhaps one option is we only run it (i) in preload.sh, and (ii) after any insert heavy experiments
Longer explanation
The segment index stores whether or not the segment contains an overflow; we use the MSB in the segment ID for this purpose. This is a performance optimization so that we can avoid going to disk when it is time to decide which segments to reorganize. We rely on this information for DB flattening as well.
Because of the bug in SegmentInfo::HasOverflow(), the index would erroneously declare that a segment has a overflow when it actually does not, and vice-versa. Essentially, HasOverflow() would be false when the segment's page offset is even, and true when the page offset is odd.
This PR fixes the DB flattening logic (and also the reorg triggering logic). The primary root cause was a subtle problem with operator precedence (see the change in
segment_info.h
on line 47). There was also a minor issue insegment_index.cc
that would cause the flattening logic to miss the first segment.The good news is that this bug doesn't affect the validity of any on-disk structures, so there should not have been any incorrectness. But we should definitely re-run any insert-heavy experiments
I also updated our experiment scripts to run
pg_check
after each workload. One downside of doing this is it may lengthen the time it takes to run the experiments, especially the YCSB workloads. What do you think? Perhaps one option is we only run it (i) inpreload.sh
, and (ii) after any insert heavy experimentsLonger explanation
The segment index stores whether or not the segment contains an overflow; we use the MSB in the segment ID for this purpose. This is a performance optimization so that we can avoid going to disk when it is time to decide which segments to reorganize. We rely on this information for DB flattening as well.
Because of the bug in
SegmentInfo::HasOverflow()
, the index would erroneously declare that a segment has a overflow when it actually does not, and vice-versa. Essentially,HasOverflow()
would befalse
when the segment's page offset is even, andtrue
when the page offset is odd.