`kytos.core.log` can contribute to slowness on hot paths (`_store_changed_flows` is slow too)

viniarck commented 2 years ago

It turns out that the current loggers (from kytos.core.log) can contribute to slowness if they are in hot paths, aka paths where a significant number of requests or events will be handled, they can contribute to dozens of milliseconds in the execution (which wouldn't be that critical, but it can add up significantly in certain hot paths). I think, in the future, when we can afford or need to further optimize this we need to consider a new log that works asynchronously or use a queue to pass the actual call to underlying existing loggers.

This became evident when I was profiling some flow_manager endpoints that mef_eline uses, I was stress testing it with 20 requests / sec which is a realistic expected workload in production:

Example:

Notice the log.info call taking 30+ ms, on this POST http://localhost:8181/api/kytos/flow_manager/v2/flows/ endpoint it's evident that the second biggest offender in terms of latency is the log call and this first is this _store_changed_flows method (which is expected, since this is using a deep copy and the more flows the worst it would get since all flows need to be inserted again on storehouse, MongoDB will enhance this):

2022-05-05-112247_1656x751_scrot

Not all the time the log.info entry will contribute with that much though, but it's unstable, the more constant the load the more it can lead to higher latencies:

2022-05-05-112159_1677x771_scrot

Here's the same execution removing both the log entry and _store_changed_flows call (very lightweight and fast):

2022-05-05-114752_1920x1056_scrot

What can we do?

Use a queue and or a async approach for the loggers
And if we don't really need logs in certain hot paths just not use them, use with moderation and only if needed

viniarck commented 2 years ago

The log entry can make the 95 percentile 4 times as slower, check out this simplified cenario with only two rest endpoints on topology:

    @rest('v4')
    def get(self) -> None:
        return jsonify()

    @rest('v4/log')
    def get_with_log(self) -> None:
        log.info("v4/log")
        return jsonify()

❯ jq -ncM '{method: "GET", url: "http://localhost:8181/api/kytos/topology/v4/"}' | vegeta attack -format=json -rate 20/1s -duration=120s | tee results.bin | vegeta report

Requests      [total, rate, throughput]         2400, 20.01, 20.01
Duration      [total, attack, wait]             2m0s, 2m0s, 5.783ms
Latencies     [min, mean, 50, 90, 95, 99, max]  1.117ms, 5.731ms, 5.86ms, 6.524ms, 9.506ms, 25.687ms, 60.534ms
Bytes In      [total, mean]                     7200, 3.00
Bytes Out     [total, mean]                     0, 0.00
Success       [ratio]                           100.00%
Status Codes  [code:count]                      200:2400
Error Set:

❯ jq -ncM '{method: "GET", url: "http://localhost:8181/api/kytos/topology/v4/log"}' | vegeta attack -format=json -rate 20/1s -duration=120s | tee results.bin | vegeta report

Requests      [total, rate, throughput]         2400, 20.01, 20.01
Duration      [total, attack, wait]             2m0s, 2m0s, 8.831ms
Latencies     [min, mean, 50, 90, 95, 99, max]  1.661ms, 10.812ms, 8.688ms, 9.792ms, 38.989ms, 90.974ms, 166.761ms
Bytes In      [total, mean]                     7200, 3.00
Bytes Out     [total, mean]                     0, 0.00
Success       [ratio]                           100.00%
Status Codes  [code:count]                      200:2400
Error Set:

viniarck commented 2 years ago

Closing this since PR https://github.com/kytos-ng/kytos/issues/250 has landed, with QueueHandler and QueueListener it's more efficient and stable now. Overall though the same log guidelines should be used, only add logs when you really need on hot paths.

kytos-ng / flow_manager

`kytos.core.log` can contribute to slowness on hot paths (`_store_changed_flows` is slow too) #77