m3db / m3

M3 monorepo - Distributed TSDB, Aggregator and Query Engine, Prometheus Sidecar, Graphite Compatible, Metrics Platform
https://m3db.io/
Apache License 2.0
4.7k stars 451 forks source link

Fixing a regression in pre-allocation for writeBatchPooledReq #4253

Closed arnav251035 closed 9 months ago

arnav251035 commented 9 months ago

What this PR does / why we need it:

Fixes: This PR fixes the bug of never pre-allocating the pool ids. Due to this bug, we observed write latencies to have spiked up, especially latencies onwards p90. The latencies spiked due to the increased calls to the newID method. The flame graphs below show the comparison between before and after this fix. Also, the capped size of the pool has been increased from 128 to 256 for heavy-load operations.

Before Fix:

image

Post Fix:

image

P99 Write Latency improvements:

image

The below image shows write P99 latencies before we were on the latest version, then what happened when we deployed the latest master and then how the latency dropped again after the patch:

image

CLAassistant commented 9 months ago

CLA assistant check
All committers have signed the CLA.

linasm commented 9 months ago

@arnav251035 stamped; good job noticing this and figuring out the root cause.

justinjc commented 9 months ago

Only doc tests are failing the CI, which itself is broken. Fix is here: https://github.com/m3db/m3/pull/4254

Meanwhile, merging this.