open-telemetry / opentelemetry-js

OpenTelemetry JavaScript Client
https://opentelemetry.io
Apache License 2.0
2.64k stars 763 forks source link

Harden AsyncHooksScopeManager regarding faulty use of AsyncResources #591

Closed Flarna closed 3 years ago

Flarna commented 4 years ago

Is your feature request related to a problem? Please describe. Some libraries may implement misbehaving AsyncResources for which the destroy callback won't be ever called. This would result in a memory leak in AsyncHooksScopeManager as the corresponding entry is never deleted from _scopes. Even the problem/bug is actually in the faulty use of AsyncResources the leak may only appear if OTel is used in such an application.

inspired by https://github.com/nodejs/node/pull/26540#discussion_r349906661

Describe the solution you'd like Add some counter measures to detect such leaks and do a cleanup of old/outdated entries.

Describe alternatives you've considered Only document that memory leaks like this could happen but root cause is somewhere else and needs to be handled somewhere else.

Additional context I think this is more a discussion then a feature request but there exists on discussion type for issues and my feeling is that feature request fits better then bug.

mayurkale22 commented 4 years ago

/cc @vmarchaud

vmarchaud commented 4 years ago

Indeed thats problematic, i believe we could use weakmaps to never old reference but i'm not sure. I don't know if it's that much important for alpha/beta but we need to have something in place for GA for sure

michaelgoin commented 4 years ago

Assuming this is still wanted/looking to be handled, I'm willing to take a stab at it likely along with @lykkin.

dyladan commented 4 years ago

You're welcome to take a shot :)

michaelgoin commented 4 years ago

After getting reacquainted with the state of things... In Node 14, and back ported to v12.17, executionAsyncResource was introduced to async-hooks. This allows storing of context data on the resource representing the current execution. A big benefit being it enables storing of context data without needing to maintain a map locally that relies on seeing destroy to cleanup. By avoiding this external collection, a class of memory leak is no-longer of concern. This is leveraged by the new async-storage API that landed in these versions. It also has the potential to remove the need for additional hooks, providing a potential perf boost.

It appears, looking at our AsyncHooksContextManager, that executionAsyncResource should work. There might be some awkwardness with the current stack design, but will have to see. Before digging too deep, I wanted to get thoughts on this potential approach and Node version support. It seems, for a community project like this, sticking to current or upcoming paved paths may be more long-term beneficial than trying to force in a solution. Particularly for something like this where something has already gone bad with end-user code.

Would we want to look into having a version of the context manager with this solved just for Node ^12.17 and >= 14? Would we prefer to defer this work until Node 10 is dropped and it is more likely to benefit more users? If defer, do we still want to do some prototyping or should I move into other work that may be more high priority for GA?

michaelgoin commented 4 years ago

@dyladan just pinging you on the above in case you didn't get a notification (not pressuring, know you are probably busy - just for visibility).

vmarchaud commented 4 years ago

For me, we should let the current AsyncHooksScopeManager as-is and focus on making a new one based on AsyncLocalStorage since its way more performant than the implementation i have wrote. See https://github.com/open-telemetry/opentelemetry-js/pull/1210

michaelgoin commented 4 years ago

I think that makes sense. Some of the benefit comes from what I was mentioning above but beyond that... using what is hoped as the future of such context tracking for Node would potentially have mutual benefit for this project and Node.

dyladan commented 4 years ago

I agree with @vmarchaud that the much safer way to handle this is to introduce it as a new context manager. After we have some use and have worked out the bugs, we can see about making it the default later.

michaelgoin commented 4 years ago

OK, sounds good. I'm guessing that's where y'all will steer #1210? If so, I'm happy to concede this issue if that makes sense and grab another item (feel free to suggest).

vmarchaud commented 4 years ago

/cc @johanneswuerbach since you opened the PR, do you plan into working on a AsyncLocalStorage-based ContextManager ?

johanneswuerbach commented 4 years ago

Yes, I actually wanted to work on that today 😂

vmarchaud commented 4 years ago

I think we can close this issue, maybe re-open a new one for the AsyncLocalStorageContextManager, WDYT ?

@michaelgoin I would suggest to try to tackle https://github.com/open-telemetry/opentelemetry-js/issues/1040

dyladan commented 4 years ago

I really want #1040 to be done. I started work on it once before but got busy doing other things. If you want it @michaelgoin comment on the issue and i'll assign you.

johanneswuerbach commented 4 years ago

As https://github.com/open-telemetry/opentelemetry-js/pull/1344 and https://github.com/open-telemetry/opentelemetry-js/issues/1040 are merged now I guess this can be closed?

vmarchaud commented 4 years ago

Well i guess. One question remains open i believe: do we want to enable the AsyncStorageContextManager by default with newer version of node 14 ?