ssbc / ssb-db2

A new database for secure-scuttlebutt
47 stars 8 forks source link

Test performance of a large number of group keys #364

Closed arj03 closed 2 years ago

arj03 commented 2 years ago

This PR came out of a discussion about how box2 behaves when you are in a large number of groups. We decided to test this and to use this inform us about how we should go about decrypting group messages. The current strategy is to just try all keys in whatever order they were added. If you are in 1000 groups, then unboxing 1000 box2 messages takes around 30s. This is with 30% of the messages matching a group you are a part of (in other words, for 70% of the messages you try all 1000 keys in vain). The scaling is roughly linear, so 100 groups is ~3s.

Mix had an idea to use membership information to only try group keys that we know the author feed is a part of. This should provide a nice performance boost as you try a lot less keys, but complicates reindexing because you need a bit more state.

Another optimization we could do, that can also be combined with the above, is to sort the keys based on when they last decrypted a message the last time. If you are in a 1000 groups, then a lot of these are probably not super active. As soon as you decrypt the message you are done, so the order really matters.

mixmix commented 2 years ago

I like your suggestion about re-ordering the keys by usage. I think that could cut the time in half (for groupNumber > 20 say).

I think the followup questions for me are:

  1. how fast is this relative to the speed that messages are validated and added to the DB
    • if it's in the same order / range, then that feels fine. If it's much slower, then this becomes a major bottleneck
  2. how targetted can re-indexing be with meta-feeds
    • e.g. can we say "re-index all subfeeds at path /groups/f3/batts for each peer" ?
    • hmm... need to check when we have to re-index
      • is it just for groups we got added to ? Ah, if we get a group key we have to re-index all '/groups/invite' subfeeds, to discover if there were people added to the subgroup before us.... we also need to re-index all '/group/f3' feeds to discover people announcing their '/group/f3/batts' subfeed id.
mixmix commented 2 years ago

Yeah linear scaling doesn't seem good.... but as you said we can start slow and know we can improve

arj03 commented 2 years ago

I changed the number of groups to 1 to try and get a baseline so that we can compare:

# add a bunch of messages
ok 6 delete db2 folder to start clean
ok 7 add 1000 elements: 582.37ms
# box1
ok 8 add 1000 box1 msgs: 1183.48ms
ok 9 unbox 1000 box1 msgs first run: 189.43ms
ok 10 unbox 1000 box1 msgs second run: 93.52ms
# private box1 no decrypt
ok 11 add 1000 box1 msgs: 1086.53ms
ok 12 query 1000 msgs first run: 72.05ms
ok 13 query 1000 msgs second run: 42.97ms
# private box2
ok 14 add 1000 box2 msgs: 1285.14ms
ok 15 unbox 1000 box2 msgs first run: 177.93ms
ok 16 unbox 1000 box2 msgs second run: 89.78ms
# private box2 group keys
ok 17 Group keys: 1, percentage for me: 30%
ok 18 add 1000 box2 group msgs: 362.55ms
ok 19 unbox 1000 box2 group msgs first run: 309.22ms
ok 20 unbox 1000 box2 group msgs second run: 118.47ms

I would have expected box2 group msgs to be at least as fast as box2 dm messages. The add 1000 elements includes validation, so in that regard the numbers are maybe not so bad for a small number of groups :)

mixmix commented 2 years ago

Are the DMs you're unboxing also 30% for us?

Are you only checking the first slot with group keys?

What order are you checking keys? DM first, then group keys, or vice versa?

arj03 commented 2 years ago

With latest commit, the base case now looks like this:

# box1
ok 8 add 1000 box1 msgs: 1214.88ms
ok 9 unbox 1000 box1 msgs first run: 184.49ms
ok 10 unbox 1000 box1 msgs second run: 98.61ms
# private box1 no decrypt
ok 11 add 1000 box1 msgs: 1077.80ms
ok 12 query 1000 msgs first run: 71.05ms
ok 13 query 1000 msgs second run: 51.28ms
# private box2
ok 14 add 1000 box2 msgs: 439.75ms
ok 15 unbox 1000 box2 msgs first run: 301.22ms
ok 16 unbox 1000 box2 msgs second run: 228.42ms
# private box2 group keys
ok 17 Group keys: 1, percentage for me: 30%
ok 18 add 1000 box2 group msgs: 323.71ms
ok 19 unbox 1000 box2 group msgs first run: 228.74ms
ok 20 unbox 1000 box2 group msgs second run: 106.67ms

The numbers are a lot higher than last post because this is on battery instead of plugged in. The important thing is numbers in relation.