oxidecomputer / crucible

A storage service.
Mozilla Public License 2.0
168 stars 18 forks source link

How do we allocate the cpu we need? #727

Open leftwo opened 1 year ago

leftwo commented 1 year ago

some questions im interested in

How do we allocate downstairs?

How do we allocate Upstairs

related: https://github.com/oxidecomputer/crucible/issues/604

faithanalog commented 1 year ago

Per upstairs:

do we have any tools besides number of worker threads

I'm not sure we even have number of worker threads at the moment because most of my time observing upstairs with prstat has only one thread active at a time, and which thread is currently active just bounces around, so there's isn't really a whole lot of parallelism up there to clamp down on as far as I can tell.

Per downstairs:

The tradeoffs you mention are things ive been thinking about too. I think grouping all downstairs on a sled into one or more common CPU buckets on a sled seems like it would be ideal for general performance, but I think ideally an operator should be able to do one of

All of these require region allocation logic that we can't really do right now, so if we want to prioritize consistency then I think we should allocate a fixed amount of CPU per region.

also on the topic, worth mentioning that the region allocation logic right now, or at least the part of it that ive been touching while working with https://github.com/oxidecomputer/omicron/issues/3416, does not limit the number of regions on a dataset. The only limits are whether the the dataset has room.

so if we allocate a fixed amount of CPU per region, we could still end up overloading a box by having way too many regions on it, but initially the work im doing to randomly distribute datasets across the rack should help make that problem less immediate

askfongjojo commented 1 year ago

Depends on having the cpu/memory limits framework

faithanalog commented 5 months ago

https://github.com/oxidecomputer/artemiscellaneous/tree/main/cru-perf/2024-05-10-omicron-fcf7980-crucible-62cc2cf-propolis-27e2789#executive-summary

Some 4k IOP results, with CPU usage and throughput/IOPS summarized. This is current as of https://github.com/oxidecomputer/crucible/commit/62cc2cfe64ca09c6876be7633355026fa65c8545