sid-project / sid

Storage Instantiation Daemon
https://sid-project.github.io
12 stars 5 forks source link

Kv store stats #93

Closed bmarzins closed 3 years ago

bmarzins commented 3 years ago

Here is the sidctl stats command, and other minor cleanups. If you want different stats reported, just let met know.

prajnoha commented 3 years ago

Looks good, thanks! Just those two things I've noticed and noted above...

bmarzins commented 3 years ago

Looks good, thanks! Just those two things I've noticed and noted above...

At the risk of showing my github ignorance, I don't see where you noted two things above. There don't appear to be any comments in the code. Am I missing something?

prajnoha commented 3 years ago

Looks good, thanks! Just those two things I've noticed and noted above...

At the risk of showing my github ignorance, I don't see where you noted two things above. There don't appear to be any comments in the code. Am I missing something?

Sorry, my fault :) There's the "submit review" button (that I feel is a bit hidden) that I always forget to push...

prajnoha commented 3 years ago

OK, thanks, will merge this!

Though, now I recall that I wanted the stats for #87 at first, but making the stats possible on demand for the whole DB through sidctl is surely also of great help for us to see the overall memory consumption impact and to see the differences when we do certain changes in how we store records in SID.

As for the optimization mentioned in #87 - the point there was to avoid iterating over all records to be able to tell how much shared memory to allocate for us to send the part of the database - in that exact case, the diffs for syncing with main db. Instead, we'd now instantaneously how much memory we're going to use for syncing. To do that, we'd need to track the memory consumption directly in the low-level kv-store internal structure and the accompanying wrapper in ubridge (additional flags and stuff around actual kv-store record).

But this won't be probably that straightforward, because we'd need to be able to filter just the records which have cetrain combination of flags/properties set. These flags/properties are the ones in the "wrapping" layer, not the "low-level" layer, but still we need to be able to tell the overall memory consumption, including both the "low-level" and "wrapper". So I think this is more about indexing then - indexing all records which have XYZ flag/property set and then being able to get the subset quickly. And then using the iterator on top of the results, counting the memory use - simply, the goal would be to get certain subset of records based on defined filter quickly, not iterating over all existing records in the whole db.

That filter might be defined even in advance, like in the case with sending db diffs from worker process to main process (_export_kv_store in src/resource/ubridge.c) where we know we're going to select the records which have KV_PERSISTENT flag set... and then differentiating the records further based on namespace: the KV_NS_UDEV ones sending to result buffer so they're exported to udev and the rest of namespace sending over to main process for syncing - and for that syncing, would be great to know how much memory to allocate for shared memory region that we're going to share with the main proces). But this is all a different topic - it's about db indexing (or using specifically constructed keys so it's easy to select subset of the db containing certain key prefixes).