Closed travisdowns closed 7 months ago
Some numbers from local testing of {std::vector, frag_vector}
x {std::sort, std::nth_element}
combinations (look at median
column):
single run iterations: 0
single run duration: 1.000s
number of runs: 1
number of cores: 12
random seed: 866896499
test iterations median mad min max allocs tasks inst
containers.std_vec_sort 23600000 36.695ns 0.000ns 36.695ns 36.695ns 0.000 0.000 90.5
containers.frag_vec_sort 7600000 125.520ns 0.000ns 125.520ns 125.520ns 0.000 0.000 999.1
containers.std_vec_nth_mid 118000000 4.199ns 0.000ns 4.199ns 4.199ns 0.000 0.000 7.1
containers.frag_vec_nth_mid 68800000 10.005ns 0.000ns 10.005ns 10.005ns 0.000 0.000 63.7
containers.std_vec_nth_end 84400000 7.299ns 0.000ns 7.299ns 7.299ns 0.000 0.000 13.6
containers.frag_vec_nth_end 40800000 18.965ns 0.000ns 18.965ns 18.965ns 0.000 0.000 125.6
containers.frag_vec_nth_end_unchecked 55600000 13.256ns 0.000ns 13.256ns 13.256ns 0.000 0.000 67.6
Basically frag vector is 3.5x slower to sort than std::vector, because of the cost of access, and nth_element
is 7x - 17x faster than std::sort
for frag vector type, depending on if the element is near the end or middle.
Switching to unchecked iterator (since we can rely on std::algos
to do the right thing) gives another ~1.4x boost.
............. 🤯 YES! haha this is epic.
This is great analysis. Hopefully the latency spike will be lowered now as we improved the resource cleanup in rm_stm
:
https://github.com/redpanda-data/redpanda/pull/11597
Also there are plans to move the cleanup completely from the critical part. We will benefit from the changes here in other places
This issue hasn't seen activity in 3 months. If you want to keep it open, post a comment or remove the stale
label – otherwise this will be closed in two weeks.
@mmaslankaprv was this closed out? If not, is it still relevant, we think?
this can be closed as the cleanup is being made outside of the critical path now
Version & Environment
Redpanda version: 23.1.11
What went wrong?
Latency spike is observed in concert with active producer IDs surpassing 1000 on a broker:
https://vectorizedio.grafana.net/d/redpanda-prod-v2/redpanda-clusters-v2?orgId=1&var-datasource=metrics-prod-cloudv2&var-redpanda_id=chiba8rosj7gvl84bk3g&var-node=All&var-node_shard=All&var-aggr_criteria=pod&from=1687770918810&to=1687773056033
Produce IDs go over 1000:
Pending tasks spike:
Reactor utilization hits 100%:
The same effect is observed on several nodes but not all nodes.
Completed tasks drop by amount half, e.g., on shard 0 from 240K/s to 122K/s, along with the increase in reactor utilization to 100% indicates that the average task length has more than doubled in length.
There were several reactor stalls during the spike, which all pointed to
cluster::rm_stm::compact_snapshot()
, usually in thestd::sort
of the timestamps.Here's an example:
It seems like when we go over 1000 PIDs, the shortcut condition:
No longer applies, so we sort the timestamp vector each time. It seems to be that the mode we get into is that new producer IDs are coming in which grows the
seq_table
until it is say 1001 elements, then we sort all 1001 elements and kick out the oldest one, then this process repeats every time we add a new PID: sorting 1001 elements just to remove 1 element (the older PIDs may be expired, but whie the number was under 1000 they are protected from removal by the_seq_table_min_size
limit.Part of the problem is that fragmented_vector is 4x slower than
std::vector
when used instd::sort
. Another thing is we don't needstd::sort
at all, we can usenth_element
to find the pivot, which is much faster (around 10x depending on the details). Plausibly there are other approaches which avoid a sort-like thing entirely, but require a larger change.What should have happened instead?
Lower CPU use.