Closed staltz closed 2 years ago
Part | Speed | Heap Change | Samples |
---|---|---|---|
Count 1 big index (3rd run) | 0.31ms ± 0.04ms | -5.65 kB ± 15.37 kB | 52 |
Create an index twice concurrently | 579.51ms ± 2.25ms | 31.17 kB ± 61.11 kB | 95 |
Load core indexes | 0.83ms ± 0.01ms | 93.75 B ± 183.36 B | 9274 |
Load two indexes concurrently | 720.66ms ± 13.28ms | -90.18 kB ± 139.83 kB | 17 |
Paginate 10 results | 25.18ms ± 1.26ms | 9.95 kB ± 18.95 kB | 28 |
Paginate 20000 msgs with pageSize=5 | 6177.78ms ± 77.45ms | -123.13 kB ± 4086.27 kB | 5 |
Paginate 20000 msgs with pageSize=500 | 594.5ms ± 23.94ms | -1.52 kB ± 712.31 kB | 20 |
Query 1 big index (1st run) | 811.31ms ± 11.76ms | 1.67 kB ± 94.22 kB | 67 |
Query 1 big index (2nd run) | 281.26ms ± 5.11ms | -30.71 kB ± 29.98 kB | 47 |
Query 3 indexes (1st run) | 1137.74ms ± 16.27ms | 24.41 kB ± 149 kB | 47 |
Query 3 indexes (2nd run) | 216.3ms ± 3.1ms | -96.49 kB ± 122.63 kB | 54 |
Query a prefix map (1st run) | 308.65ms ± 4.08ms | 49.13 kB ± 513.65 kB | 20 |
Query a prefix map (2nd run) | 14.32ms ± 0.42ms | -3.9 kB ± 16.56 kB | 24 |
Here's previous benchmarks to compare with: https://github.com/ssb-ngi-pointer/jitdb/pull/197
But there's no noticeable difference. But do we have a benchmark to measure this case? (A big index receives a bit of updates)
This looks like a great optimization. I'm wondering if we can't we use bsb that is already included?
Good point! I switched to using bsb.eq in getSeqFromOffset, and I took a look at the source code, it's basically the same, so that's good. Now the PR is a net negative in lines of code changed. :sunglasses:
I also added another commit use seq index.count instead of index.tarr.length.
Part | Speed | Heap Change | Samples |
---|---|---|---|
Count 1 big index (3rd run) | 0.41ms ± 0.04ms | -3.24 kB ± 15.44 kB | 44 |
Create an index twice concurrently | 757.32ms ± 7ms | 46.4 kB ± 73.81 kB | 72 |
Load core indexes | 1.46ms ± 0.02ms | 94.27 B ± 248.4 B | 6504 |
Load two indexes concurrently | 709.12ms ± 16.92ms | 57.29 kB ± 142.61 kB | 17 |
Paginate 10 results | 24.43ms ± 1.05ms | 972 B ± 18893.48 B | 28 |
Paginate 20000 msgs with pageSize=5 | 5856.57ms ± 250.37ms | -441.06 kB ± 4406.04 kB | 5 |
Paginate 20000 msgs with pageSize=500 | 455.57ms ± 8.72ms | 8.86 kB ± 669.42 kB | 22 |
Query 1 big index (1st run) | 822.93ms ± 8.8ms | -5.34 kB ± 84.71 kB | 66 |
Query 1 big index (2nd run) | 264.16ms ± 1.51ms | 15.44 kB ± 31.24 kB | 52 |
Query 3 indexes (1st run) | 854.47ms ± 9.14ms | -7.25 kB ± 108.83 kB | 64 |
Query 3 indexes (2nd run) | 282.62ms ± 1.95ms | -27.61 kB ± 172.07 kB | 41 |
Query a prefix map (1st run) | 272.8ms ± 5.3ms | 10.09 kB ± 179.72 kB | 22 |
Query a prefix map (2nd run) | 16.26ms ± 0.45ms | 1.2 kB ± 18.66 kB | 21 |
Part | Speed | Heap Change | Samples |
---|---|---|---|
Count 1 big index (3rd run) | 0.35ms ± 0.03ms | -5.7 kB ± 18.5 kB | 51 |
Create an index twice concurrently | 585.93ms ± 16.1ms | 1.02 kB ± 52.52 kB | 94 |
Load core indexes | 1.21ms ± 0.01ms | 98.8 B ± 208.69 B | 7478 |
Load two indexes concurrently | 700.31ms ± 15.45ms | 90.45 kB ± 100.59 kB | 18 |
Paginate 10 results | 37.91ms ± 2.34ms | 47.24 kB ± 42.95 kB | 22 |
Paginate 20000 msgs with pageSize=5 | 6090.69ms ± 60.36ms | 2.98 MB ± 2.67 MB | 5 |
Paginate 20000 msgs with pageSize=500 | 540.77ms ± 27.94ms | 565.28 kB ± 582.67 kB | 20 |
Query 1 big index (1st run) | 819.2ms ± 7.73ms | -33.3 kB ± 75.36 kB | 66 |
Query 1 big index (2nd run) | 313.45ms ± 2.57ms | -6.5 kB ± 33.52 kB | 43 |
Query 3 indexes (1st run) | 1026.27ms ± 9.03ms | 37.59 kB ± 137.64 kB | 53 |
Query 3 indexes (2nd run) | 226.9ms ± 2.62ms | -30 kB ± 201.89 kB | 52 |
Query a prefix map (1st run) | 294.02ms ± 4.54ms | 28.23 kB ± 44.66 kB | 21 |
Query a prefix map (2nd run) | 13.75ms ± 0.55ms | -7.06 kB ± 14.91 kB | 24 |
Context
I'm preparing for #199 and one thing I realized is that I'll need a quick way of getting the seq from any given offset, e.g. during compaction there is going to be
unshiftedOffset
and I'd like to discoverunshiftedSeq
.Problem
We don't have a helper function for that and currently we're just doing a linear for-loop to look for the offset.
Another problem is that the "end" of the for-loop is
tarr.length
, which I think is wrong because the tarr grows much more than thecount
does (like it doubles in size every time it "grows" but most of it is still unused). Instead we need to useindex.count
.Solution
Introduce
getSeqFromOffset
that does a binary search and usesindexes['seq'].count
as the end marker.This should also have a performance improvement for those cases when the jitdb indexes are many and large but we just need to update the tip of the index. This way we can avoid scanning the whole index in
O(n)
and can doO(log n)
. I'm not sure we have any benchmark for this but should be a pretty common thing to happen in production.