ssbc / jitdb

A database on top of a log with automatic index generation and maintenance
50 stars 7 forks source link

fix pagination nextSeq when there are holes #225

Closed staltz closed 2 years ago

staltz commented 2 years ago

Fixes (somewhat) issue #223.

jitdb.paginate's "nextSeq" calculation is improved now, by looking at all the seqs, not at the non-deleted records (this would have cause the same "page" to be scanned a few times, resulting in duplicates).

And now toPullStream() can emit pages that are smaller than the requested page size, but it will never emit pages of size zero, we pull.filter those out.

arj03 commented 2 years ago

Converting the code into how I see it, to see if I got the changes right:

So this makes paginate behave like limit, in that it can return fewer results if things are deleted?

There is still the compaction problem, but I do think this is a good step forward in that we have one less problem to worry about.

github-actions[bot] commented 2 years ago

Benchmark results

Part Speed Heap Change Samples
Count 1 big index (3rd run) 0.62ms ± 0.4ms -9.76 kB ± 37.27 kB 48
Create an index twice concurrently 752.04ms ± 7.52ms -2.88 kB ± 32.22 kB 72
Load core indexes 1.07ms ± 0.02ms 146.59 B ± 285.89 B 7269
Load two indexes concurrently 559.73ms ± 5.58ms -213.17 kB ± 301.84 kB 20
Paginate 10 results 27.05ms ± 0.62ms 8.44 kB ± 34.16 kB 24
Paginate 20000 msgs with pageSize=5 6216.03ms ± 317.62ms 4.03 MB ± 3.2 MB 5
Paginate 20000 msgs with pageSize=500 638.83ms ± 10.77ms 263.95 kB ± 800.71 kB 17
Query 1 big index (1st run) 849.31ms ± 5.05ms 7.67 kB ± 76.95 kB 63
Query 1 big index (2nd run) 351.04ms ± 2.76ms -27.21 kB ± 31.42 kB 39
Query 3 indexes (1st run) 870.02ms ± 3.21ms -23.94 kB ± 68.66 kB 62
Query 3 indexes (2nd run) 346.4ms ± 5.81ms 13.12 kB ± 251.02 kB 36
Query a prefix map (1st run) 272.4ms ± 2.05ms 24.17 kB ± 718.27 kB 23
Query a prefix map (2nd run) 10.3ms ± 0.65ms -34.13 kB ± 181.96 kB 23
staltz commented 2 years ago

So this makes paginate behave like limit, in that it can return fewer results if things are deleted?

Yes

There is still the compaction problem, but I do think this is a good step forward in that we have one less problem to worry about.

What do you mean with "the compaction problem" (I've been solving so many problems, I get lost which one we're talking about)?

arj03 commented 2 years ago

What do you mean with "the compaction problem" (I've been solving so many problems, I get lost which one we're talking about)?

:grin: this one: https://github.com/ssbc/jitdb/pull/224#discussion_r921961951

staltz commented 2 years ago

Ah, right. I call that the rewinding problem. :sweat_smile:

Yes, I'll revise that part.

github-actions[bot] commented 2 years ago

Benchmark results

Part Speed Heap Change Samples
Count 1 big index (3rd run) 0.59ms ± 0.34ms 9.86 kB ± 22.05 kB 36
Create an index twice concurrently 575.52ms ± 3.2ms 9.81 kB ± 22.61 kB 95
Load core indexes 1.24ms ± 0.02ms 111.01 B ± 301.32 B 6883
Load two indexes concurrently 641.46ms ± 5.72ms -79.66 kB ± 342.89 kB 18
Paginate 10 results 20.81ms ± 0.24ms -37.79 kB ± 51.99 kB 27
Paginate 20000 msgs with pageSize=5 6838.09ms ± 243.39ms -1.04 MB ± 5.39 MB 5
Paginate 20000 msgs with pageSize=500 569.22ms ± 4.7ms 255.21 kB ± 780.03 kB 20
Query 1 big index (1st run) 844.8ms ± 3.06ms -32.51 kB ± 91.02 kB 64
Query 1 big index (2nd run) 294.29ms ± 1.48ms 1.97 kB ± 25.26 kB 50
Query 3 indexes (1st run) 844.39ms ± 8.08ms 55.43 kB ± 60.19 kB 64
Query 3 indexes (2nd run) 320.94ms ± 4.99ms -85.19 kB ± 44.12 kB 38
Query a prefix map (1st run) 376.55ms ± 6.72ms 12.78 kB ± 255 kB 18
Query a prefix map (2nd run) 12.03ms ± 1.51ms 241.22 kB ± 239.72 kB 24