powersync-ja / powersync-service

Other
123 stars 9 forks source link

Fix checksum cache with compacting #92

Closed rkistner closed 1 month ago

rkistner commented 1 month ago

The checksum cache works not just by caching specific checksum queries, but also by re-using earlier checksums and just adding the partial checksum of any newer operations on the bucket, which can be significantly faster than recalculating the entire checksum from scratch.

Compacting a bucket guarantees that the final checksum for the bucket stays the same. However, there is one edge case:

// 1. Start with:
{ op_id: 1, op: 'PUT', row_id: 'A', checksum: 1 },
// 2. Lookup and cache the checksum (1).
// 3. Insert rows:
{ op_id: 2, op: 'PUT', row_id: 'A', checksum: 2 },
{ op_id: 3, op: 'PUT', row_id: 'A', checksum: 4 }
// The checksum is now 7 (but we don't cache in this example)

// 4. Compact to:
{ op_id: 2, op: 'CLEAR', checksum: 3 },
{ op_id: 3, op: 'PUT', row_id: 'A', checksum: 4 }
// Lookup the checksum again.
// The partial checksum is 7.
// If we add to the previous checksum (1), we get a checksum of 8, which is wrong.

The core issue is the CLEAR operation indicates that the checksum must be reset, instead of adding to the checksum of previous operations. While this is done correctly on the client, the checksum caching did not provision for this.

This only happened when the compact resulted in a CLEAR operation that is later than the last checksum cache. This could happen when for example a table was re-created from scratch (or every row updated), and the bucket is then compacted, without looking up & caching checksums in between. This means the issue was rare, but had a major impact when it did occur.

The fix is to detect CLEAR operations when computing a partial checksum, and resetting the checksum in that case.


This also adds an 1-hour TTL to cache entries. This is not needed to keep the cache small (it is size-limited already), but it helps to eventually refresh bucket counts after a compact. If the checksum cache for a bucket is updated within the TTL period, this has no effect, since the new entries will have a new expiration time. We could eventually add a mechanism to explicitly flush the cache after a compact operation.

changeset-bot[bot] commented 1 month ago

🦋 Changeset detected

Latest commit: 1fd50a52510033acbf75e100edbe7ace9a3f067e

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 3 packages | Name | Type | | ------------------------ | ----- | | @powersync/service-core | Patch | | @powersync/service-image | Patch | | test-client | Patch |

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR