open-telemetry / opentelemetry-erlang

OpenTelemetry Erlang SDK
https://opentelemetry.io
Apache License 2.0
326 stars 102 forks source link

Processor name atom leak #645

Open tsloughter opened 10 months ago

tsloughter commented 10 months ago

h/t to @starbelly for being the first to encounter this :) and pointing out it slows down persistent_term.

Issue is we need a unique name for each processor and the way it is currently done means leaking atoms and terms because we generate it not unique to each configured processor but to each one that runs -- so a crashing processor will keep adding new keys to persistent term.

I think we can resolve this by simply generating a name at configuration time or init_processors.

starbelly commented 10 months ago

@tsloughter I wonder if there's two issues here that are related. One that I noticed was creating many entries in persistent_term on boot, but devoid of crashes. I think what you're saying can excacerbate the issue ofc.

I wonder if persistent_term in the end is a good fit here, without more understanding of opentelemetry it's hard for me to say, but I think regardless, it may be advantageous to consider not using persistent_term. All the configuration I see in it right now may be fine, but only if it can all be stuffed into one big term and what's more never updated.

If that's not the case then we may want to consider using ets instead, the reads would be slower, but the trade off would be worth it. Specifically, we could make an ets table directly in a supervisor that is public, such that it has most of the same properties of persistent_term, but not effecting the systems use of persistent_term as a whole.

The only rub there would be if telemetry app crashes and/or the supervisor bounces.

tsloughter commented 10 months ago

There should only be like 4 terms at boot. But then there is 1 term per-Tracer that gets used. A Tracer by default is per-Application. So 1 term per Application that has tracing code in it. So technically the day a project has 100s of instrumented Applications that may be a problem. But that'll be a good day :).

I'd need to see what you seeing to know if maybe its another bug that is creating more terms than expected. If so we can add a test to in the future make sure that on startup nothing new starts getting added.

tsloughter commented 10 months ago

And term is used because tracing needs to have as close to 0 overhead as possible. OTP logger does the same.

starbelly commented 10 months ago

There should only be like 4 terms at boot. But then there is 1 term per-Tracer that gets used. A Tracer by default is per-Application. So 1 term per Application that has tracing code in it. So technically the day a project has 100s of instrumented Applications that may be a problem. But that'll be a good day :).

I'd need to see what you seeing to know if maybe its another bug that is creating more terms than expected. If so we can add a test to in the future make sure that on startup nothing new starts getting added.

I think you stated it perfectly, yet yeah, I might be seeing something abnormal too. In a nutshell I see keys for lots of modules, not just the application names.

As an example and locally in dev mode, I'll see {:opentelemetry, Credo.Check.Refactor.WithClauses}, without debugging myself, but by looking at the code this is a result of opentelemetry:registry_application_tracer/1, which is being called by opentelemetry_app on boot (i.e., it will enumerate all loaded applications and call register_application_tracer/1 for all apps). This is configurable, so that's good.

I'm all for using persistent_term :) Just have to figure out how how not create so many terms.

tsloughter commented 10 months ago

Oooh, you are on an old version aren't you?

starbelly commented 10 months ago

Oooh, you are on an old version aren't you?

Indeed. So, what I described, the second part, seems like a non-issue now.

Edit:

Ok, brought myself up to date. For posterity, otel no longer does what I described as a default, instead, if the sdk is enabled (config), loaded applications are inserted into persistent_term, but not an entry for each module. Adding modules is now done lazily in a with_span. Now what you are saying about how it does this like logger makes sense 😄

I wonder if there's a better way to do this still, but it is off topic.