ssbc / jitdb

A database on top of a log with automatic index generation and maintenance
50 stars 7 forks source link

bitset operations are aware of deletes #224

Closed staltz closed 2 years ago

staltz commented 2 years ago

Draft PR for issue #223

@arj03 Check the approach I'm using here, where I listen to an obz from AAOL. See also the AAOL branch https://github.com/ssbc/async-append-only-log/pull/82

Before proceeding with polishing this idea, I want to see what you think and I want to check the performance.

staltz commented 2 years ago

Tests pass in node.js 12, but I'm debugging now why they fail in 16.x

arj03 commented 2 years ago

I guess we can try this approach, we still need to handle that there might be deletes we did not get because this way of signalling deletes can fail (see my comment here: https://github.com/ssbc/jitdb/issues/223). I'm also interested in the seeing the performance aspect of this bitset approach, that is at least something (as you write) to test now :)

staltz commented 2 years ago

this way of signalling deletes can fail

How? (I read the comment but I didn't understand how it could fail)

Deletes are detected in two ways: in log.stream when building indexes (thus this covers the "old" case) and immediately when calling log.del() (thus covering the "live" part).

github-actions[bot] commented 2 years ago

Benchmark results

Part Speed Heap Change Samples
Count 1 big index (3rd run) 0.65ms ± 0.07ms 14.46 kB ± 22.9 kB 50
Create an index twice concurrently 702.56ms ± 7.2ms -7.05 kB ± 32.9 kB 77
Load core indexes 1.23ms ± 0.02ms 109.05 B ± 356.43 B 6526
Load two indexes concurrently 576.84ms ± 4.89ms 86.79 kB ± 305.3 kB 20
Paginate 10 results 21.63ms ± 0.3ms 64.94 kB ± 64.7 kB 27
Paginate 20000 msgs with pageSize=5 17007.6ms ± 250.82ms 3.57 MB ± 7.73 MB 5
Paginate 20000 msgs with pageSize=500 737.66ms ± 7.85ms 790.32 kB ± 4175.34 kB 18
Query 1 big index (1st run) 1121.11ms ± 17.98ms 21.95 kB ± 93.56 kB 47
Query 1 big index (2nd run) 236.8ms ± 0.97ms -16.17 kB ± 62.69 kB 58
Query 3 indexes (1st run) 822.93ms ± 13.98ms -15.18 kB ± 52.18 kB 66
Query 3 indexes (2nd run) 248.27ms ± 1.2ms 17.4 kB ± 35.04 kB 52
Query a prefix map (1st run) 277.71ms ± 4.66ms 362.43 kB ± 752.67 kB 23
Query a prefix map (2nd run) 15.46ms ± 0.44ms -9.01 kB ± 82.59 kB 23
staltz commented 2 years ago

Notice the benchmark:

Before:

Paginate 20000 msgs with pageSize=5 8093.26ms ± 509.73ms

After:

Paginate 20000 msgs with pageSize=5 17007.6ms ± 250.82ms

All the other numbers seem to be okayish.

arj03 commented 2 years ago

Notice the benchmark:

Before:

Paginate 20000 msgs with pageSize=5 8093.26ms ± 509.73ms

After:

Paginate 20000 msgs with pageSize=5 17007.6ms ± 250.82ms

All the other numbers seem to be okayish.

I put in a timing on the intersectWithPresent and its around 0.03 - 0.3 ms. The problem with this paginate is that you run this soo many times (4000).

staltz commented 2 years ago

I'm bothered about the deeper issues in this issue, and I thought "do we have to solve this?".

There are two problems going on:

Problem 1 is a big deal. Problem 2 is a small deal. So maybe we don't need to solve number 2, we just need to make sure that number 1 is solved. And I think that might not be too hard, just gotta track the seqs better. I'll try building that soon.