open-telemetry / opentelemetry-js

OpenTelemetry JavaScript Client
https://opentelemetry.io
Apache License 2.0
2.72k stars 793 forks source link

Question: Plans for Node.js ESM Module Support? #1946

Closed astorm closed 1 year ago

astorm commented 3 years ago

Opening this issue based on a conversation with @dyladan and @obecny in order to start a wider discussion.

Has there been any talk about how the Node side of the project is going to address to coming ESM/ES6 module migration that (parts of) the node community is planning once Node 10 reaches end of life?

Specifically -- Node.js has added native support (no bundler needed) for the import statment and ESM/ES6 modules. These modules can't be monkey-patched in the same way that CommonJS can via require-in-the-middle. This means a lot of auto instrumentation is going to stop working as packages begin to adopt ESM modules and end users begin to use them.

Node's ESM modules have a loader API which might offer a way to monkey-patch modules in the same way, but I don't know if anyone's done any public research on that yet. (waves in the direction of @michaelgoin for reasons)

In addition to the question of instrumentation, there's the question of what sort of modules this project will publish to NPM (CommonJS, ESM, both) and whether the project's individual modules will ship with with type attribute set or not.

Finally, FWIW, these questions come out of a discussion with @delvedor and @mcollina about fastify's plans for ESM support, and that @mcollina pulled together a list of the relevant Node.js core working groups and issues for folks interested in getting involved on that side of things.

astorm commented 2 years ago

It's been around 8 months since this issue was originally opened and just under a month since someone from the OpenTelemetry project chimed in. My read on that is ESM/import support isn't a priority for the project and won't be forthcoming anytime soon -- which isn't the answer I wanted but is an answer I can work with.

I'm going to close this issue out -- if folks are interested in making this happen I'd recommend opening a new issue and reengaging with the stakeholders. Showing up with an implementation in hand would also be motivating :)

bengl commented 2 years ago

Just a heads up for the folks in this thread who care about such things: The latest version of import-in-the-middle supports the latest version of loader hooks (and the previous one as well).

The way we use it in dd-trace-js is just putting it along side require-in-the-middle. Any hook is (mostly) compatible, so we just add them to both, that way it doesn't matter which way a module is loaded, it'll still be intercepted and patched.

I'm happy to make a PR doing the same thing here, but I'm quite unfamiliar with the codebase. If someone can point me at the point at which require-in-the-middle (or a similar tool) is used, and related tests, I ought to be able to make an attempt at it. The usual loaders caveat of a CLI arg being required doesn't change here. The details of that can be discussed in the PR.

lizthegrey commented 2 years ago

The pointy end is here. Thank you so very much for the offer to help @bengl. https://github.com/open-telemetry/opentelemetry-js/blob/fdab6422ae238b04beb2e92ac38153e35e5800ac/experimental/packages/opentelemetry-instrumentation/src/platform/node/instrumentation.ts#L139

bengl commented 2 years ago

Here's a start. https://github.com/open-telemetry/opentelemetry-js/pull/2640

vmarchaud commented 2 years ago

I'll reopen the issue while we are discuting the PR

djMax commented 2 years ago

Did this ever happen?

frank-dspeed commented 2 years ago

I identifyed the problems to be more big i need to open a new issue:

gabrielemontano commented 2 years ago

any news about https://github.com/open-telemetry/opentelemetry-js/pull/2640 ??

frank-dspeed commented 2 years ago

@gabrielemontano nope but i would simply wait some more month as soon as the nodejs vm modules reache a more stable state they can rebase on that then no unstable loader needed NodeJS gets a Userland Module system via the VM module.

Flarna commented 2 years ago

NodeJS gets a Userland Module system via the VM module

Do you have some links to these activities?

frank-dspeed commented 2 years ago

@Flarna

many functions got also additional now

importModuleDynamically Called during evaluation of this module when import() is called. If this option is not specified, calls to import() will reject with ERR_VM_DYNAMIC_IMPORT_CALLBACK_MISSING. This option is part of the experimental modules API. We do not recommend using it in a production environment. specifier specifier passed to import() script importAssertions The "assert" value passed to the optionsExpression optional parameter, or an empty object if no value was provided. Returns: | Returning a vm.Module is recommended in order to take advantage of error tracking, and to avoid issues with namespaces that contain then function exports.

internal that api uses the v8 context api it creates a context for the module reads the secret keys and if you link a context they exchange that security key to access each other. That method is save i only comment on that for engineers who wonder how that works.

mcollina commented 2 years ago

I might have missed it.. but what's the connection between the change in the vm module and ESM support in opentracing? Do you use the vm module anywhere and you'd need these?

astorm commented 2 years ago

I might have missed it.. but what's the connection between the change in the vm module and ESM support in opentracing? Do you use the vm module anywhere and you'd need these?

Wondered the same thing, tbh. I think maybe @frank-dspeed was theorizing what OpenTelemetry might do based on their own experience? (or maybe @frank-dspeed can chime in themselves?) I can see how some of those methods in vm might be useful for import-in-the-middle's whole "regenerate a different ESM module at runtime" thing but last time I looked at import-in-the-middle it did not use the vm module for anything (unless the global dynamic import() function comes from that module?).

Looking at the state of play: the last attention this issue got was a good-faith/appreciated PR attempt by @bengl to use import-in-the-middle for ESM module/method wrapping (here: https://github.com/open-telemetry/opentelemetry-js/pull/2640) -- and that for unclear reasons (most likely time/priority/resources/oh-wow-this-is-a-lot-more-work-than-it-first-looked) it never really picked up steam and doesn't seem to be a priority for the project right now. (someone, please, make me a liar ๐Ÿฅฒ)

frank-dspeed commented 2 years ago

@astorm what i was referencing to was that the new functionality in the vm module can be used to replace import in the middle but at all why this got no steam is!

In General every one today bundels instrumentation into debug bundels and metric bundels then he loadbalances some people on that to get data that is the flow used at scale today

No one would run something like import in the middle in production results would be not as deterministic as needed.

thats why a ship able build is more favored.

astorm commented 2 years ago

Thank you @frank-dspeed -- I appreciate the response. One last question.

Are you saying you're actively working on (or know people who are actively working on?) using the vm module instead of import-in-the-middle to give OpenTelemetry the ability to wrap and therefore instrument ESM modules?

Or were you suggesting it as an approach for the project?

Or some third thing?

frank-dspeed commented 2 years ago

@astorm it is the only way that you can do it in fact as this does not depend on the loader api and this is cross Engine Compatible so would also be port able to graaljs and others.

Speaking out of NodeJS Community view the vm module is the only way to interact with the ESM module cache or generation in userland that is spec compilant anything else needs to hijack the fetch api as that is the other connected part of ESM loading

v8 engine parses and send results to the Embedding Host the Runtime in this case NodeJS

import `specifier`

NodeJS Runtime

// does resolve specifier
// does load result via fetch! fetch is part of the ESM module spec. 
// sends memory pointer back to Engine v8 where your code is

Browser Runtime

// Browser does resolve the specifier 
// (it is a valid url or asks the import map a additional system to map bare specifiers to modules) 

// then it fetches the url
// this can be intercepted by serviceworker spec
// then it loads that into the global url based cache and returns the memory pointer
astorm commented 2 years ago

Thank you for the insight @frank-dspeed -- I didn't see a direct answer to my question about whether this was something you were actively working on or if it was what you were suggesting the OpenTelemetry project consider. I'm going to presume the later but if that changes please let the project know!

djMax commented 2 years ago

I don't like import-in-the-middle at all. But we are indeed running it in production. It would require changes from package owners to do it any other way, and that's difficult. However, the new node diagnostics channel might be a middle ground that could work. I use it here:

https://github.com/gas-buddy/opentelemetry-instrumentation-fetch-node

frank-dspeed commented 2 years ago

@astorm the open telemetry project needs to wait for something total diffrent at present i am leading at present a bigger effort to refactor the build systems of NodeJS and Chromium as also everything related to that when that effort is finished you got total new ways and i will supply a good implementation

at present i would suggest to use rollup if you want to work with esm you create a bundle and after that you can

wrap that as export format you choose SystemJS that is the ESM Module system Implemented in Userland and is able to get used like the import in the middle way to wrap stuff

astorm commented 2 years ago

I don't like import-in-the-middle at all. But we are indeed running it in production. It would require changes from package owners to do it any other way, and that's difficult. However, the new node diagnostics channel might be a middle ground that could work. I use it here:

@djMax Thanks for that. One quick clarifying question. You mention the diagnostic channel as middle ground -- this would only work if the underlying Node.js module or third party package provided diagnostic channel events, right? If there's a package that doesn't offer these events (which is most of them?) then that's not an option for instrumentation which means method wrapping of some sort would remain the only way to bring existing instrumentation forward for ESM or Node's "import a commonjs module via ESM" functionality?

Or do I misunderstand what you're saying? (even money bet)

mcollina commented 2 years ago

@frank-dspeed did you have a chat with the folks working on the relative piece in Node.js?

My understanding is that the work is likely going into a different direction to what you are envisioning here, or the way you are thinking to use those API is not prioritized.

cc @GeoffreyBooth

I would also recommend against using the same implementation for server and browser runtimes. They have different requirements and are hardly the same.

djMax commented 2 years ago

Or do I misunderstand what you're saying? (even money bet)

@astorm you don't misunderstand. I agree we wish package maintainers didn't have to "do something" to make it work, but sadly I don't think that makes it any less true. This require hook stuff is just like swizzling in objective c or a variety of other last-ditch techniques. Even without the existential threat of ESM, I think we risk hard to explain incompatibility (package changes, bun, you name it). So it's less a question of doing it without external help and more a question of asking for something (a) small (b) generic to the problem space and (c) so vanilla that it's hard to complain about. Diag has potential to be that, but only if it gets broader adoption than telemetry

mcollina commented 2 years ago

@frank-dspeed according to your statement, opentelemetry-js will not support Node.js ESM unless a runtime-generic API is created. This is years of work.

Is the above opinion shared across the opentelemetry project? I don't see you as somebody that has contributed to this repo at all or as member of this organization.

cc @dyladan @legendecas @pichlermarc

vmarchaud commented 2 years ago

Is the above opinion shared across the opentelemetry project? I don't see you as somebody that has contributed to this repo at all or as member of this organization.

Not at all, a PR is open to implement it there: https://github.com/open-telemetry/opentelemetry-js/pull/2846. Multiple people (including me) has tried to push it forward without having the time to finish it yet.

This is years of work.

Totally agree, and most importantly this is years of work across multiple runtimes (browsers, node, edge workers, deno and now bun) so even more complex. Efforts were done for async context when they were only browsers and node and it will be even harder to pull it off for telemetry.

Flarna commented 2 years ago

import-in-the-middle is in my opinion similar as require-in-the-middle. Everyone agrees that monkey patching is not nice and not the "optimum" way to get telemetry data but there is no alternative as of now.

In my opinion the issue with import-in-the middle as of now is that it has to rely on loader hooks which are still experimental and a moving target. It may work quite fine on bleeding edge node versions but a lot production workload still runs on node 14 or even older.

In an ideal world database drivers, WebFrameworks,... could integrate calls to OTel as the API is GA since a while now. If they don't want to depend on OTel they could provide hooks (e.g. node.js diagnostics channel or something else). But this happens only at a a few places and often it doesn't provide the info/flexibility needed. As a result instrumentations fill this gap. The JS language has no APIs for instrumentation like e.g. java or .NET so we end up in monkey patching.

But this is by far not the full OTel projects. Everyone is free to use OTel without using instrumentations by just calling the Otel APIs to collect traces, metrics,... There is no need for require/import-in-the-middle.

Qard commented 1 year ago

Apparently I wasn't mentioned in here before so entirely missed the discussion about diagnostics_channel, but I can provide some context as the creator of it.

The original reason for creating diagnostics_channel was specifically to side-step all these concerns about ESM loaders and the early talk about module records being immutable. By providing a simple emitter for named diagnostics events we could define some loose structure to data captured at our usual injection points while having the lowest impact possible. We can also capture this data directly in the source modules without any fragile monkey-patching.

Yes, diagnostics_channel on its own does not handle correlation between disparate events and that is by-design. The intent is for it to be paired with another tool like AsyncLocalStorage to provide the context to which to bind this data. Context awareness and gathering of data are two completely different tasks which are often conflated due to how often they are needed together. By breaking them down into their respective components though there's a whole lot more flexibility in how to compose the two and also enables using them in isolation in cases where control flow context is not relevant like active request metrics tracking.

Node.js core specifically opted against integrating a richer tracing API at the time because for one thing that's a much heavier proposition, but also the ecosystem had just got burned by OpenTracing supposedly being the "standard" and then OTel coming along with different ideas. Both OpenTracing and OpenTelemetry are fairly opinionated, which is fine but means there's a high risk of differing opinions emerging, resulting in forks or new designs, further churning the ecosystem. This would mean anything provided in Node.js core could quickly become obsolete and removing things from Node.js takes a long time. I would much rather have a lower-level, uncontroversial primitive to simply broadcast metadata and leave it up to higher-level tools to consume that data and decide what to do with it.

The combination of diagnostics_channel and AsyncLocalStorage is designed intentionally to feed well into richer tracing systems by selecting the appropriate channels to consume and structuring the data how the consumer sees fit using AsyncLocalStorage for context storage. You can think of diagnostics_channel much like static probe systems like DTrace and SystemTap, but in JavaScript--there's a bunch of pre-defined channels you could listen to, if you're interested, but you can pick and choose which are relevant to your particular scenario. If we integrated a tracing library more directly then we would again need to find a way to reduce the overall data stream down to the subset a given consumer actually feels is relevant, which then creates a ton more complexity with questions like how do we reformulate the tree if we want to take spans out of it? There's a lot of complexity added by filtering at the higher-level where structure has already been imposed. It's much easier and less resource-intensive to filter at the lower-level of just selectively listening to interesting channels.

djMax commented 1 year ago

^^ 1000 times that. My endorsement is worth precisely zero, but that approach gets it without reservation.

Has there been any thought about channel names and discovery? I couldn't find it.

Qard commented 1 year ago

It has been relatively left up to each library to choose their own naming scheme, though the docs do suggest a module prefix to avoid collisions and providing docs for each of their channels along with the input object shapes. There are docs for the limited set of events currently emitted by Node.js core though: https://nodejs.org/api/diagnostics_channel.html#built-in-channels

astorm commented 1 year ago

Thanks for dropping by @Qard -- I'm not sure I'd seen the case/theory-of-operation for the diagnostic channel laid out that plainly and well before. (You'all should do more blogging/evangelizing to folks not cozy with the foundation and core project ๐Ÿ˜ธ)

To sum up what's been said by Stephen and others, a diagnostic channel approach would require library writers to self-instrument their code if they wanted it to be instrumented/traced. Legit -- but orthogonal to this particular issue.

Qard commented 1 year ago

Yes, I intend on being more vocal about it going forward. We kind of kept a bit quiet about it for awhile to give it a trial period to prove the concept. Seems to have succeeded at that so we're going to start pushing it more publicly going forward. :)

Also, small note that the diagnostics_channel approach doesn't exactly require that libraries self-instrument. The monkey-patching approach remains usable for the old libraries that might never gain the diagnostics_channel changes necessary, and you can simply make patches that emit diagnostics_channel events. The benefit of using channels with patches is that there's a clear and easy way to turn individual instrumentations on or off and it also provides a nice clean path to move instrumentation out of APMs though with low effort. This is the approach Datadog has taken and we're quite satisfied with this approach.

astorm commented 1 year ago

As much as I've appreciated the discussion here, as the person who first opened this issue a year and a half ago I'd like to pull things back on track and just sum up where things are so that folks googling this know the state of play.

  1. People want to use import in the their Node.js programs without relying on compilers/bundlers

  2. Native Node ESM modules and Node's "load a CommenJS module in node_modules via import functionality" provide a way to do that

  3. However, native ESM modules and Node's "load a CommenJS module in node_modules via import functionality" don't allow a module to be redefined at runtime, which means none of the third party library tracing instrumentation modules provided by OpenTelemetry (or others) can work with an ESM based Node.js program.

  4. This blocks folks who are required to have a tracing library from using ESM. While it's true they could manually instrument using the API/SDK, the expectation in industry/the-market is that a tracing solution will pickup things like web-frameworks and db calls on its own

  5. Node's loader hooks feature allow a module's source to be redefined prior to runtime

  6. The import-in-the-middle module uses loader hooks to provide an experience that seems like monkey patching and provides similar functionality (it actually provides a proxy object -- the implementation's pretty neat/wild)

  7. At least one other APM/Observability vendor uses import-in-the-middle for their wrapping, which makes it (by some definitions) production acceptable

  8. Therefore, import-in-the-middle (or some other loader hooks based solution) appears to be the most expedient path forward to allow OpenTelemetry users to write "ESM First" programs and still take advantage of the third party tracing instrumentation provided by the OpenTelemetry project. (see packages with instrumentation in the name for the sort of functionality this effects: https://github.com/open-telemetry/opentelemetry-js/tree/a3e40da38aafd0809a5a9702bd181c4d4bebfcae/experimental/packages and https://github.com/open-telemetry/opentelemetry-js-contrib/tree/20767c4fffee34bc51392894001bbb667576e91d/plugins/node ).

  9. Work to do this was begun multiple times (https://github.com/open-telemetry/opentelemetry-js/pull/2763 https://github.com/open-telemetry/opentelemetry-js/pull/2640) but petered out for reasons. The latest and most up to date attempt is at https://github.com/open-telemetry/opentelemetry-js/pull/2846

  10. My general read on the situation is the OpenTelemetry maintainers would welcome this, but their limited resource are being applied elsewhere and there's no plan to put this on the schedule. If folks were interested in giving it a try themselves hopping on the CNCF slack and saying hi in #otel-js is probably the best place to start

vmarchaud commented 1 year ago

The key point of that system is called pm2 it is from a company called keymetrics it is the defacto standard at present in NodeJS for Metrics and instrumentation at last in the real world and it works

Well i've maintained pm2 and worked as the CTO at keymetrics between 2016 and 2019, i wrote the library used to generate metrics/trace (https://github.com/keymetrics/pm2-io-apm/graphs/contributors) so i think i'm well suited to say that it's not magical, doesn't support "pretrabyte" scale and use both require-in-middle: https://github.com/keymetrics/pm2-io-apm/blob/master/package.json#L91 and opencensus (which is the project which pre-date opentelemetry) https://github.com/keymetrics/pm2-io-apm/tree/master/src/census/plugins. I started to work on OpenTelemetry after i left in 2019 (just when the Otel SIG JS was formed in June) because i saw the way forward as vendor sharing client-side implementations. After i left both keymetrics and pm2 were left without much maintenance (clearly visible in activity graph of the repos) so it wasn't put up to date with more modern approach including otel. I might add that it doesn't support ESM either.

My general read on the situation is the OpenTelemetry maintainers would welcome this, but their limited resource are being applied elsewhere and there's no plan to put this on the schedule. If folks were interested in giving it a try themselves hopping on the CNCF slack and saying hi in #otel-js is probably the best place to start

I re-iterate that we do welcome this, the most advanced PR was mine (because i based on both priors PRs) https://github.com/open-telemetry/opentelemetry-js/pull/2846, the current to-do is:

weyert commented 1 year ago

Hmmm i was now reviewing the current project state of open-telemetry overall and i see nothing at all. When i get it right and i am not missing any importent code then this offers me exactly what? it defines function naming patterns for my loggers? that i later turn into insights like traces?

What do you mean exactly? OpenTelemetry offers a standard for telemetry, see the spec. This repo offers similar functionality as your suggested pm2 but in a vendor agnostic way. I can send my metrics and traces to Google Trace by using their otel-js exporters or I could use any other tracing/telemetry vender like Dynatrace that supports OTLP directly.

OpenTelemetry like pm2 instruments http, and other popular packages , see opentelemetry-js-contrib repo for those

frank-dspeed commented 1 year ago

talked to some engineers they informed me that you maybe missing out some informations for your considerations firefox spidermonkey so warporacle already got structural instrumentation and adds it by default while v8 + chrome has console.time("zone") and console.timeEnd("zone") as also chrome://tracing to expose the data in a nice view able way.

They strongly suggest to simply wrap that on demand as also not call the console functions directly abstract it so that you can shirt circuit it

const trace = (zone) => console.time(zone)
const traceEnd = (zone) => console.timeEnd(zone)

// so that you can do in production
const trace = (zone) => { /** NoOp */ }
const traceEnd = (zone) => { /** NoOp */ }
legendecas commented 1 year ago

@frank-dspeed you send a lot of long messages to this thread, and I have a hard time following what your goals or questions are most of the time.

It's very difficult for me to determine if everything you're posting is even on the topic of OpenTelemetry Node.js ESM integration and the frequency of your posts could be discouraging others to participate.

I'd like to ask you to be more considerate in how much you expand and how much context you give. Thanks!

frank-dspeed commented 1 year ago

@legendecas i also have a hard time i can even also not assert what belongs here where and that shows me that we need to have some more thinking before coding but got your Point i will clean this issue up for you and i have good news for you i invested again a whole day to evaluate if this has any reasonable feature in the next 3 years and i got the result that we will see no messages from me or my affiliates in this project for the next years.

gabrielemontano commented 1 year ago

@vmarchaud have you some updates about this thread?

vmarchaud commented 1 year ago

Nothing new since the discussion on the PR here: https://github.com/open-telemetry/opentelemetry-js/pull/2846, i didn't have the time to continue it though since october

schickling commented 1 year ago

๐Ÿ‘ ๐Ÿ‘ ๐Ÿ‘ @pichlermarc

pichlermarc commented 1 year ago

@schickling I only merged the PR :sweat_smile: All the work was done by @JamieDanielson @pkanal :slightly_smiling_face: