use binary search to get seq from offset

staltz commented 2 years ago

Context

I'm preparing for #199 and one thing I realized is that I'll need a quick way of getting the seq from any given offset, e.g. during compaction there is going to be unshiftedOffset and I'd like to discover unshiftedSeq.

Problem

We don't have a helper function for that and currently we're just doing a linear for-loop to look for the offset.

Another problem is that the "end" of the for-loop is tarr.length, which I think is wrong because the tarr grows much more than the count does (like it doubles in size every time it "grows" but most of it is still unused). Instead we need to use index.count.

Solution

Introduce getSeqFromOffset that does a binary search and uses indexes['seq'].count as the end marker.

This should also have a performance improvement for those cases when the jitdb indexes are many and large but we just need to update the tip of the index. This way we can avoid scanning the whole index in O(n) and can do O(log n). I'm not sure we have any benchmark for this but should be a pretty common thing to happen in production.

github-actions[bot] commented 2 years ago

Benchmark results

Part	Speed	Heap Change	Samples
Count 1 big index (3rd run)	0.31ms ± 0.04ms	-5.65 kB ± 15.37 kB	52
Create an index twice concurrently	579.51ms ± 2.25ms	31.17 kB ± 61.11 kB	95
Load core indexes	0.83ms ± 0.01ms	93.75 B ± 183.36 B	9274
Load two indexes concurrently	720.66ms ± 13.28ms	-90.18 kB ± 139.83 kB	17
Paginate 10 results	25.18ms ± 1.26ms	9.95 kB ± 18.95 kB	28
Paginate 20000 msgs with pageSize=5	6177.78ms ± 77.45ms	-123.13 kB ± 4086.27 kB	5
Paginate 20000 msgs with pageSize=500	594.5ms ± 23.94ms	-1.52 kB ± 712.31 kB	20
Query 1 big index (1st run)	811.31ms ± 11.76ms	1.67 kB ± 94.22 kB	67
Query 1 big index (2nd run)	281.26ms ± 5.11ms	-30.71 kB ± 29.98 kB	47
Query 3 indexes (1st run)	1137.74ms ± 16.27ms	24.41 kB ± 149 kB	47
Query 3 indexes (2nd run)	216.3ms ± 3.1ms	-96.49 kB ± 122.63 kB	54
Query a prefix map (1st run)	308.65ms ± 4.08ms	49.13 kB ± 513.65 kB	20
Query a prefix map (2nd run)	14.32ms ± 0.42ms	-3.9 kB ± 16.56 kB	24

staltz commented 2 years ago

Here's previous benchmarks to compare with: https://github.com/ssb-ngi-pointer/jitdb/pull/197

But there's no noticeable difference. But do we have a benchmark to measure this case? (A big index receives a bit of updates)

arj03 commented 2 years ago

This looks like a great optimization. I'm wondering if we can't we use bsb that is already included?

staltz commented 2 years ago

Good point! I switched to using bsb.eq in getSeqFromOffset, and I took a look at the source code, it's basically the same, so that's good. Now the PR is a net negative in lines of code changed. :sunglasses:

I also added another commit use seq index.count instead of index.tarr.length.

github-actions[bot] commented 2 years ago

Benchmark results

Part	Speed	Heap Change	Samples
Count 1 big index (3rd run)	0.41ms ± 0.04ms	-3.24 kB ± 15.44 kB	44
Create an index twice concurrently	757.32ms ± 7ms	46.4 kB ± 73.81 kB	72
Load core indexes	1.46ms ± 0.02ms	94.27 B ± 248.4 B	6504
Load two indexes concurrently	709.12ms ± 16.92ms	57.29 kB ± 142.61 kB	17
Paginate 10 results	24.43ms ± 1.05ms	972 B ± 18893.48 B	28
Paginate 20000 msgs with pageSize=5	5856.57ms ± 250.37ms	-441.06 kB ± 4406.04 kB	5
Paginate 20000 msgs with pageSize=500	455.57ms ± 8.72ms	8.86 kB ± 669.42 kB	22
Query 1 big index (1st run)	822.93ms ± 8.8ms	-5.34 kB ± 84.71 kB	66
Query 1 big index (2nd run)	264.16ms ± 1.51ms	15.44 kB ± 31.24 kB	52
Query 3 indexes (1st run)	854.47ms ± 9.14ms	-7.25 kB ± 108.83 kB	64
Query 3 indexes (2nd run)	282.62ms ± 1.95ms	-27.61 kB ± 172.07 kB	41
Query a prefix map (1st run)	272.8ms ± 5.3ms	10.09 kB ± 179.72 kB	22
Query a prefix map (2nd run)	16.26ms ± 0.45ms	1.2 kB ± 18.66 kB	21

github-actions[bot] commented 2 years ago

Benchmark results

Part	Speed	Heap Change	Samples
Count 1 big index (3rd run)	0.35ms ± 0.03ms	-5.7 kB ± 18.5 kB	51
Create an index twice concurrently	585.93ms ± 16.1ms	1.02 kB ± 52.52 kB	94
Load core indexes	1.21ms ± 0.01ms	98.8 B ± 208.69 B	7478
Load two indexes concurrently	700.31ms ± 15.45ms	90.45 kB ± 100.59 kB	18
Paginate 10 results	37.91ms ± 2.34ms	47.24 kB ± 42.95 kB	22
Paginate 20000 msgs with pageSize=5	6090.69ms ± 60.36ms	2.98 MB ± 2.67 MB	5
Paginate 20000 msgs with pageSize=500	540.77ms ± 27.94ms	565.28 kB ± 582.67 kB	20
Query 1 big index (1st run)	819.2ms ± 7.73ms	-33.3 kB ± 75.36 kB	66
Query 1 big index (2nd run)	313.45ms ± 2.57ms	-6.5 kB ± 33.52 kB	43
Query 3 indexes (1st run)	1026.27ms ± 9.03ms	37.59 kB ± 137.64 kB	53
Query 3 indexes (2nd run)	226.9ms ± 2.62ms	-30 kB ± 201.89 kB	52
Query a prefix map (1st run)	294.02ms ± 4.54ms	28.23 kB ± 44.66 kB	21
Query a prefix map (2nd run)	13.75ms ± 0.55ms	-7.06 kB ± 14.91 kB	24

ssbc / jitdb