fix pagination nextSeq when there are holes

ssbc / jitdb

A database on top of a log with automatic index generation and maintenance

50 stars 7 forks source link

fix pagination nextSeq when there are holes #225

Closed staltz closed 2 years ago

staltz commented 2 years ago

Fixes (somewhat) issue #223.

jitdb.paginate's "nextSeq" calculation is improved now, by looking at all the seqs, not at the non-deleted records (this would have cause the same "page" to be scanned a few times, resulting in duplicates).

And now toPullStream() can emit pages that are smaller than the requested page size, but it will never emit pages of size zero, we pull.filter those out.

arj03 commented 2 years ago

Converting the code into how I see it, to see if I got the changes right:

So this makes paginate behave like limit, in that it can return fewer results if things are deleted?

There is still the compaction problem, but I do think this is a good step forward in that we have one less problem to worry about.

github-actions[bot] commented 2 years ago

Benchmark results

Part	Speed	Heap Change	Samples
Count 1 big index (3rd run)	0.62ms ± 0.4ms	-9.76 kB ± 37.27 kB	48
Create an index twice concurrently	752.04ms ± 7.52ms	-2.88 kB ± 32.22 kB	72
Load core indexes	1.07ms ± 0.02ms	146.59 B ± 285.89 B	7269
Load two indexes concurrently	559.73ms ± 5.58ms	-213.17 kB ± 301.84 kB	20
Paginate 10 results	27.05ms ± 0.62ms	8.44 kB ± 34.16 kB	24
Paginate 20000 msgs with pageSize=5	6216.03ms ± 317.62ms	4.03 MB ± 3.2 MB	5
Paginate 20000 msgs with pageSize=500	638.83ms ± 10.77ms	263.95 kB ± 800.71 kB	17
Query 1 big index (1st run)	849.31ms ± 5.05ms	7.67 kB ± 76.95 kB	63
Query 1 big index (2nd run)	351.04ms ± 2.76ms	-27.21 kB ± 31.42 kB	39
Query 3 indexes (1st run)	870.02ms ± 3.21ms	-23.94 kB ± 68.66 kB	62
Query 3 indexes (2nd run)	346.4ms ± 5.81ms	13.12 kB ± 251.02 kB	36
Query a prefix map (1st run)	272.4ms ± 2.05ms	24.17 kB ± 718.27 kB	23
Query a prefix map (2nd run)	10.3ms ± 0.65ms	-34.13 kB ± 181.96 kB	23

staltz commented 2 years ago

So this makes paginate behave like limit, in that it can return fewer results if things are deleted?

Yes

There is still the compaction problem, but I do think this is a good step forward in that we have one less problem to worry about.

What do you mean with "the compaction problem" (I've been solving so many problems, I get lost which one we're talking about)?

arj03 commented 2 years ago

What do you mean with "the compaction problem" (I've been solving so many problems, I get lost which one we're talking about)?

:grin: this one: https://github.com/ssbc/jitdb/pull/224#discussion_r921961951

staltz commented 2 years ago

Ah, right. I call that the rewinding problem. :sweat_smile:

Yes, I'll revise that part.

github-actions[bot] commented 2 years ago

Benchmark results

Part	Speed	Heap Change	Samples
Count 1 big index (3rd run)	0.59ms ± 0.34ms	9.86 kB ± 22.05 kB	36
Create an index twice concurrently	575.52ms ± 3.2ms	9.81 kB ± 22.61 kB	95
Load core indexes	1.24ms ± 0.02ms	111.01 B ± 301.32 B	6883
Load two indexes concurrently	641.46ms ± 5.72ms	-79.66 kB ± 342.89 kB	18
Paginate 10 results	20.81ms ± 0.24ms	-37.79 kB ± 51.99 kB	27
Paginate 20000 msgs with pageSize=5	6838.09ms ± 243.39ms	-1.04 MB ± 5.39 MB	5
Paginate 20000 msgs with pageSize=500	569.22ms ± 4.7ms	255.21 kB ± 780.03 kB	20
Query 1 big index (1st run)	844.8ms ± 3.06ms	-32.51 kB ± 91.02 kB	64
Query 1 big index (2nd run)	294.29ms ± 1.48ms	1.97 kB ± 25.26 kB	50
Query 3 indexes (1st run)	844.39ms ± 8.08ms	55.43 kB ± 60.19 kB	64
Query 3 indexes (2nd run)	320.94ms ± 4.99ms	-85.19 kB ± 44.12 kB	38
Query a prefix map (1st run)	376.55ms ± 6.72ms	12.78 kB ± 255 kB	18
Query a prefix map (2nd run)	12.03ms ± 1.51ms	241.22 kB ± 239.72 kB	24