This issue is a follow-up of https://github.com/ooni/probe/issues/2413. Today we noticed excessive load in the oohelperd caused by spikes of synchronized requests hitting our oohelperd deployments.
We investigated the cause of the overload, which boils down to the following flame graph:
So, basically oohelperd is spending most of its time doing crypto for TLS handshakes. The impact on performance metrics is roughly like the following in terms of pretty much any metrics including time to service a request:
That is: resource consumption, time to service requests, CPU usage, etc. all go up pretty much in the same way.
We want to protect the oohelperd when there are too many clients by returning 504. Additionally, we want extra metrics to understand the time spent in each micro-operation (DNS, TCP, TLS, and HTTP).
This issue is a follow-up of https://github.com/ooni/probe/issues/2413. Today we noticed excessive load in the oohelperd caused by spikes of synchronized requests hitting our oohelperd deployments.
We investigated the cause of the overload, which boils down to the following flame graph:
So, basically oohelperd is spending most of its time doing crypto for TLS handshakes. The impact on performance metrics is roughly like the following in terms of pretty much any metrics including time to service a request:
That is: resource consumption, time to service requests, CPU usage, etc. all go up pretty much in the same way.
We want to protect the oohelperd when there are too many clients by returning 504. Additionally, we want extra metrics to understand the time spent in each micro-operation (DNS, TCP, TLS, and HTTP).