scoutapp / scout_apm_elixir

ScoutAPM Elixir Agent. Supports Phoenix and other frameworks.
https://scoutapm.com
Other
36 stars 20 forks source link

Small Agent Manager Performance Tweaks #121

Closed jeregrine closed 1 year ago

jeregrine commented 3 years ago

A customer reported that memory was steadily growing in the AgentManager process during periods of high load, reporting that the AgentManager mailbox was exploding in size as the culprit.

The Core Agent is single threaded so opening more agents connections will not help, we need a way to buffer messages or block the AgentManager for less time so it can chew threw its mailbox faster.

Ideally these changes will take pressure off the AgentManager process by only sending messages 1/10th the time it was previously, and doing less work per message in aggregate. There is a risk that we lose messages if it closes before we send.

Possible Future Changes

dlanderson commented 3 years ago

@jeregrine Unless TCP overhead vs local socket is adding not insignificant overhead, let's keep TCP as the default. We ran into a lot of issues with the unix socket (permissions, mounting/path issues, etc) that we don't have to deal with when using TCP. These days, the TCP stacks on modern OS distros are optimized enough that we shouldn't be seeing a dramatic difference in overhead. See also: https://github.com/scoutapp/scout_apm_elixir/issues/115 (should be closed/resolved but somehow it's still marked as open :)

dlanderson commented 2 years ago

@jeregrine Any update on this? We had another customer hit issues with a very large message queue

jeregrine commented 2 years ago

To clarify: this PR has things that speed up the elixir api reporter but we're not bottlenecked here. We're bottlenecked waiting for the Agent to respond. We can't open multiple connects to it or send data faster (at least that was the case when this PR opened).

So we could implement a periodic task that drops the queue with some heuristics and users lose data. Which I guess is better than the situation now when the agent crashes because it's overloaded.

Let me know how you'd like me to proceed but at the moment we're kinda stuck between rock and a hard place.