ssbc / jitdb

A database on top of a log with automatic index generation and maintenance
50 stars 7 forks source link

fix off-by-one error when saving index count #235

Closed staltz closed 2 years ago

staltz commented 2 years ago

Context

https://github.com/ssbc/jitdb/issues/234

Problem

We have an off-by-one mistake in the usage of the save(count, offset, done) function in updateIndexes().

The first argument is count but we passed seq to it, while the intention was to pass seq + 1, because seq is zero-based, it's used to pick items from the tarr array.

When the log stream ends, we call save() correctly, but when we call save() every 1min during the log stream, we passed the wrong value. In Manyverse, this meant that while indexing was in-progress, the intermediate saved states were not correct, they only got correct when the log stream ended. So if the app crashed in the middle, it would start from the wrong point on the next app startup.

Before this PR, the getSeqFromOffsets would give the wrong value, but it was doing a binary search from positions 0 to count - 1 in the tarr, while the correct value was actually at position count. So this meant that getSeqFromOffsets returned seq 0, and this would then create a disalignment between "seqs" and "offsets" on other indexes.

I have a strong suspicion that this is THE indexing bug that we've been trying to hunt down for months/years now, because this bug has been here since jitdb@2.3.3, which is the version of jitdb we used in Manyverse the first time we put ssb-db2 in.

Solution

Make sure that save()'s first argument is always semantically the "count".

arj03 commented 2 years ago

Oh my 😱 It has been hiding in plain sight. Extra sneaky because of this:

const count = seq // incremented at the end of write()
          save(count,....
github-actions[bot] commented 2 years ago

Benchmark results

Part Speed Heap Change Samples
Count 1 big index (3rd run) 0.46ms ± 0.24ms 10.94 kB ± 30.1 kB 47
Create an index twice concurrently 711.42ms ± 4.55ms 11.86 kB ± 27.74 kB 75
Load core indexes 1.65ms ± 0.03ms 90.38 B ± 410.46 B 5382
Load two indexes concurrently 555.25ms ± 3.78ms 117.33 kB ± 233.06 kB 20
Paginate 10 results 22.5ms ± 0.97ms 8.69 kB ± 75.13 kB 24
Paginate 20000 msgs with pageSize=5 6831.19ms ± 219.93ms 2.59 MB ± 3.75 MB 5
Paginate 20000 msgs with pageSize=500 645.46ms ± 8.67ms 32.31 kB ± 940.93 kB 18
Query 1 big index (1st run) 806.64ms ± 4.76ms 53.3 kB ± 71.45 kB 67
Query 1 big index (2nd run) 323.84ms ± 1.52ms 10.59 kB ± 38.84 kB 45
Query 3 indexes (1st run) 809.17ms ± 3.51ms 40.08 kB ± 61.2 kB 67
Query 3 indexes (2nd run) 308.8ms ± 7.05ms -16.43 kB ± 37.65 kB 39
Query a prefix map (1st run) 337.61ms ± 2.13ms 253.71 kB ± 428.17 kB 20
Query a prefix map (2nd run) 14.46ms ± 0.98ms 12.86 kB ± 282.76 kB 18
staltz commented 2 years ago

Yeah this one was very sneaky. It's sad that the solution to this bug is literally two keyboard strokes. :smiling_face_with_tear:

github-actions[bot] commented 2 years ago

Benchmark results

Part Speed Heap Change Samples
Count 1 big index (3rd run) 0.64ms ± 0.44ms -12.27 kB ± 42.99 kB 37
Create an index twice concurrently 813.49ms ± 5.5ms -5.89 kB ± 35.12 kB 66
Load core indexes 1.73ms ± 0.03ms 94.2 B ± 469.71 B 5050
Load two indexes concurrently 707.38ms ± 16.67ms 248.49 kB ± 302.46 kB 16
Paginate 10 results 20.2ms ± 0.18ms 52.4 kB ± 67.7 kB 28
Paginate 20000 msgs with pageSize=5 8054.54ms ± 176.29ms -145.32 kB ± 1419.16 kB 5
Paginate 20000 msgs with pageSize=500 556.7ms ± 12.84ms 259.1 kB ± 555.41 kB 20
Query 1 big index (1st run) 752.23ms ± 5.21ms -5.01 kB ± 68.58 kB 72
Query 1 big index (2nd run) 319.28ms ± 2.24ms -4.92 kB ± 36.04 kB 45
Query 3 indexes (1st run) 867.16ms ± 5.5ms 22.55 kB ± 77.74 kB 62
Query 3 indexes (2nd run) 332.94ms ± 3.95ms -10.46 kB ± 39.8 kB 37
Query a prefix map (1st run) 310.24ms ± 4.59ms 115.64 kB ± 119.92 kB 21
Query a prefix map (2nd run) 12.77ms ± 1.65ms 75.81 kB ± 219.88 kB 20