Closed viniarck closed 2 years ago
The log entry can make the 95 percentile 4 times as slower, check out this simplified cenario with only two rest endpoints on topology
:
@rest('v4')
def get(self) -> None:
return jsonify()
@rest('v4/log')
def get_with_log(self) -> None:
log.info("v4/log")
return jsonify()
❯ jq -ncM '{method: "GET", url: "http://localhost:8181/api/kytos/topology/v4/"}' | vegeta attack -format=json -rate 20/1s -duration=120s | tee results.bin | vegeta report
Requests [total, rate, throughput] 2400, 20.01, 20.01
Duration [total, attack, wait] 2m0s, 2m0s, 5.783ms
Latencies [min, mean, 50, 90, 95, 99, max] 1.117ms, 5.731ms, 5.86ms, 6.524ms, 9.506ms, 25.687ms, 60.534ms
Bytes In [total, mean] 7200, 3.00
Bytes Out [total, mean] 0, 0.00
Success [ratio] 100.00%
Status Codes [code:count] 200:2400
Error Set:
❯ jq -ncM '{method: "GET", url: "http://localhost:8181/api/kytos/topology/v4/log"}' | vegeta attack -format=json -rate 20/1s -duration=120s | tee results.bin | vegeta report
Requests [total, rate, throughput] 2400, 20.01, 20.01
Duration [total, attack, wait] 2m0s, 2m0s, 8.831ms
Latencies [min, mean, 50, 90, 95, 99, max] 1.661ms, 10.812ms, 8.688ms, 9.792ms, 38.989ms, 90.974ms, 166.761ms
Bytes In [total, mean] 7200, 3.00
Bytes Out [total, mean] 0, 0.00
Success [ratio] 100.00%
Status Codes [code:count] 200:2400
Error Set:
Closing this since PR https://github.com/kytos-ng/kytos/issues/250 has landed, with QueueHandler
and QueueListener
it's more efficient and stable now. Overall though the same log guidelines should be used, only add logs when you really need on hot paths.
It turns out that the current loggers (from
kytos.core.log
) can contribute to slowness if they are in hot paths, aka paths where a significant number of requests or events will be handled, they can contribute to dozens of milliseconds in the execution (which wouldn't be that critical, but it can add up significantly in certain hot paths). I think, in the future, when we can afford or need to further optimize this we need to consider a new log that works asynchronously or use a queue to pass the actual call to underlying existing loggers.This became evident when I was profiling some flow_manager endpoints that mef_eline uses, I was stress testing it with 20 requests / sec which is a realistic expected workload in production:
Example:
log.info
call taking 30+ ms, on thisPOST http://localhost:8181/api/kytos/flow_manager/v2/flows/
endpoint it's evident that the second biggest offender in terms of latency is the log call and this first is this_store_changed_flows
method (which is expected, since this is using a deep copy and the more flows the worst it would get since all flows need to be inserted again on storehouse, MongoDB will enhance this):Here's the same execution removing both the log entry and
_store_changed_flows
call (very lightweight and fast):What can we do?