m-lab / traceroute-caller

A sidecar service which runs traceroute after a connection closes
Apache License 2.0
18 stars 5 forks source link

Proposal: Add admission controller to traceroute-caller #109

Open gfr10598 opened 3 years ago

gfr10598 commented 3 years ago

It looks like traceroute-caller, at least with scamper-daemon, has limited capacity, and, even with -p 1000, seem to get slower and slower if there are too many requests coming in.

If we limit the number of concurrent traces, it will improve latency, and likely have little or no effect on throughput.

These dashboard panels for gru01 basically shows that, for current deployment, things work ok until about 60 to 70 concurrent traces, then rapidly start to get much worse. This happen at around 15 trace per minute, which is much too slow for our busier sites.

Screen Shot 2021-06-26 at 10 55 18 AM

So, perhaps we should limit the number of concurrent traces we allow to start. We should evaluate the practical throughput with the pending deployment, and set a corresponding threshold for rejecting new traceroute requests. It appears that the limit can be fairly conservative - perhaps 30 or 40, as the throughput is quite insensitive to the concurrency.