helium / helium-packet-router

Apache License 2.0
8 stars 8 forks source link

change how we store SKF for performance gains #256

Closed michaeldjeffrey closed 1 year ago

michaeldjeffrey commented 1 year ago

SKF are stored with the timestamp of the last time they were used to make it more likely to hit a session key for a packet early. An ets ordered_set table was used to achieve this, but it follows total sorting, which means the timestamp needs to be the first element in the table if we want that to take precedence.

Part of removing, or updating, a SKF involved ets:select_delete/2 which requires a full table scan because we only had the SessionKey at the time of deletion. When maybe removes or updates come in from the config service, all of those updates would be spawned into their own process and contend for the SKF table at the same time. For sufficiently large ets tables (+10k) and sufficiently large batches of updates (1k), this would cause all schedulers to have trouble doing anything while tables scans were happening. This pausing would mean the grpc connection could not talk back to the config-service and negotiate a slower rate of updates, causing them to get out of order, and grpcbox would kill the connection. And it would all start again.

The structure of the table has been flattened, so we can tell ets the key position is the SessionKey in the second slot. It will still respect total ordering with the timestamp in the first position, but we can now do deletes in constant time with only the SessionKey. :dancer:

michaeldjeffrey commented 1 year ago

The test data I was using mislead me. It also sorts on keypos. :sigh: