nodejs / modules

Node.js Modules Team
MIT License
411 stars 46 forks source link

using ecmascript modules in an apm environment #424

Open bmacnaughton opened 4 years ago

bmacnaughton commented 4 years ago

I work with appoptics-apm and have been watching progress on ECMAScript modules from a distance for a while. I'd like to thank everyone for wrestling with the complexities involved in integrating ES modules with CJS modules.

I started working with node 13 recently to test how patching will work with ES modules. At this point it's not clear to me. I know that patching is a bit of an edge case but APM products depend on it. I haven't followed every conversation on this in detail so it's quite possible that I'm missing some basic things that have been discussed and should be obvious.

For any not familiar with how we patch, here's a quick overview.

  1. our agent is required before any other modules.
  2. it replaces module.constructor.prototype.require with our patchedRequire()
  3. patchedRequire() uses realRequire() to load the requested module (mod).
  4. patchedRequire() loads package.json using this.require.call(this, 'name/package.json')
  5. patchedRequire() creates a relative require function for this module using this.require.bind(this)
  6. patchedRequire() requires the instrumentation code and executes it with the argument mod
  7. the instrumentation code returns the patched mod
  8. patchedRequire() replaces the cached version of mod with the patched version of mod

Step 4 is used to make decisions about what must be patched for a given version of the package being loaded. Step 5 is used when patching a package requires that files in its lib (or other directory) need to be patched as well.

There are other approaches and I'm open to changing how our patching works if need be. But given how this seems to be working now it's not clear to me that hooks are intended to address APM-style monkey-patching.

The following code is my attempt to get started with how this would work.

Example

simple.js:

import lib from './lib-func.mjs';
import {another} from './lib-func.mjs';

console.log(lib, another);

hooks/simple.js

import {pathToFileURL} from 'url';
const baseURL = pathToFileURL(process.cwd()).href;

const doImport = false;

const log = function (...args) {args.unshift('hooks:'); console.log(...args);}

const probes = new Map();

export async function resolve (specifier, parentURL, defaultResolver) {
  const isMain = parentURL === undefined;
  // resolve it using the default function.
  const resolution = defaultResolver(specifier, isMain ? baseURL : parentURL);
  log(`resolving ${specifier}`);
  // how to know "main" modules (to set require context)?
  if (specifier === './lib-func.mjs') {
    log(`making ${specifier} dynamic`);
    // is there any other way to get get control via the `dynamicInstantiate()` hook?
    resolution.format = 'dynamic';
    probes[resolution.url] = specifier;
  }
  return resolution;
}

export async function dynamicInstantiate (url) {
  log(`dynamic ${url}`);
  if (url in probes) {
    log(`found probe`, probes[url]);
    if (!doImport) {
      return {
        exports: ['default'],
        execute: (exports) => {
          exports.default.set(function hacking () {return 'hacking'});
        }
      }
    } else {
      return import(url).then(m => {
        return {
          exports: ['default'],
          execute: (exports) => {
            console.log('execute', exports);
            exports.default.set(m);
          }
        }
      })
    }
  } else {
    return {
      exports: ['default'],
      execute: (exports) => {
        exports.default.set(function fakeXyzzy () {return 'really don\'t want to be here'});
      }
    }
  }
}

When execute it as is it works, more or less as I would expect it to.

bruce@uxpanapa:~/working/esm-modules$ node --experimental-loader ./hooks/simple.mjs  ./simple.mjs
(node:1343) ExperimentalWarning: The ESM module loader is experimental.
(node:1343) ExperimentalWarning: --experimental-loader is an experimental feature. This feature could change at any time
hooks: resolving file:///home/bruce/working/esm-modules/simple.mjs
hooks: resolving ./lib-func.mjs
hooks: making ./lib-func.mjs dynamic
hooks: dynamic file:///home/bruce/working/esm-modules/lib-func.mjs
hooks: found probe ./lib-func.mjs
[Function: hacking]
hacking

But when I change doImport to true it fails to output anything.

bruce@uxpanapa:~/working/esm-modules$ node --experimental-loader ./hooks/simple.mjs  ./simple.mjs
(node:1701) ExperimentalWarning: The ESM module loader is experimental.
(node:1701) ExperimentalWarning: --experimental-loader is an experimental feature. This feature could change at any time
hooks: resolving file:///home/bruce/working/esm-modules/simple.mjs
hooks: resolving ./lib-func.mjs
hooks: making ./lib-func.mjs dynamic
hooks: dynamic file:///home/bruce/working/esm-modules/lib-func.mjs
hooks: found probe ./lib-func.mjs
hooks: resolving file:///home/bruce/working/esm-modules/lib-func.mjs

What am I missing? Is it not possible to execute import() in the dynamic hook? I can see how a dynamic hook would allow arbitrary code to be created but I don't see how I would patch the existing module (and/or cache).

Any pointers would be appreciated.

devsnek commented 4 years ago

You shouldn't use import() to provide the unmodified target. In this case, you should know in the resolve hook whether you want to apm something or not.

In any case, you create a circular dependency from the module to itself here, and resolution can't finish until it resolves, and it can't resolve until it resolves.

bmacnaughton commented 4 years ago

Thanks. I can import the module in the resolve hook then use the dynamic hook to return it. Is that the intended mechanism?

How can the cache be modified to reflect the patched module? Is there a way to do relative references of child modules (other than URL manipulation)? We also patch node modules (fs, http, zlib, crypto). Is there a way to sync both caches?

bmacnaughton commented 4 years ago

I'm not following how importing the module in the resolve hook should work.

There is no mechanism for the resolve hook to return a loaded/patched module. I can import the module there. But once I have the module I need to return resolution.format = 'dynamic' in order to for the dynamic hook to be run. But the dynamic hook will already have been skipped because the code imports the module in resolve() before returning the format 'dynamic'.

I tried writing to default, e.g,

const mod = await import(resolution.url);
mod.default = function () {return 'i have been patched'};

but default is a read-only property of the module.

Is there an example of monkey-patching you can point me to?

devsnek commented 4 years ago

at the moment there's no way to load the original module, at least not easily. hooks are still a WIP

guybedford commented 4 years ago

@devsnek does dynamic import not work in the current loader at all? That does sound like a possible bug to me.

@bmacnaughton the loader hooks are still under development and are definitely missing APM functionality which they intend to solve. We have designs for this but have development bottlenecks as no one is working on it while we are currently dealing with shipping our MVP unflagged.

bmeck commented 4 years ago

For a variety of uses loader hooks do work still as they exist, but yes expect them to change over time. https://github.com/bmeck/node-apm-loader-example shows an example of intercepting and redirecting for APM using just a resolve hook.

devsnek commented 4 years ago

@guybedford its not that it "doesn't work", more that it behaves oddly. You've resolved X to Y and then you ask for X again while in the middle of loading Y, which directs you back to Y again. The solution is to perform no remapping if the referrer is the hook file.

guybedford commented 4 years ago

Ah right so it’s the promise deadlock case. Yes so the solution is readdressing the original source load (‘?original’ type resolving, where an opt-out breaks the circularity).

On Mon, Nov 4, 2019 at 13:59 Gus Caplan notifications@github.com wrote:

@guybedford https://github.com/guybedford its not that it "doesn't work", more that it behaves oddly. You've resolved X to Y and then you ask for X again while in the middle of loading Y, which directs you back to Y again. The solution is to perform no remapping if the referrer is the hook file.

— You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub https://github.com/nodejs/modules/issues/424?email_source=notifications&email_token=AAESFSQ7N4RWDPRA33IZD73QSBWJ3A5CNFSM4JIWIFQ2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEDAKZYI#issuecomment-549498081, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAESFSTD3PXPXPGQOW6HCG3QSBWJ3ANCNFSM4JIWIFQQ .

devsnek commented 4 years ago

@guybedford its a delicate situation. perhaps we should have a special import exposed to each hook that bypasses that specific hook. there are still a lot of open questions.

guybedford commented 4 years ago

Layered addressing becomes very tricky to manage and I don’t tend to like the idea of loader access levels but it could be an alternative yes.

On Mon, Nov 4, 2019 at 14:26 Gus Caplan notifications@github.com wrote:

@guybedford https://github.com/guybedford its a delicate situation. perhaps we should have a special import exposed to each hook that bypasses that specific hook. there are still a lot of open questions.

— You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub https://github.com/nodejs/modules/issues/424?email_source=notifications&email_token=AAESFSWPHI3NE2ZHQMZ4OU3QSBZPZA5CNFSM4JIWIFQ2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEDANQDI#issuecomment-549509133, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAESFSQ4N32P4MDDFUXIXP3QSBZPZANCNFSM4JIWIFQQ .

bmacnaughton commented 4 years ago

@bmeck - thank you for the example. I'm getting hit by the flu right now but will digest it more deeply as soon as I can. It does address a number of issues and raises a few questions as well.

weswigham commented 4 years ago

Would it be improper to arrange loaders like a middleware stack, where when you have loaders 1...N, using import in loader N invokes loader N-1, which, if it uses import, invokes loader N-2, which continues until loader 1, whose import invokes the unmodified native import?

guybedford commented 4 years ago

@weswigham this is effectively how the multiple loaders proposals work, yes. Resolvers are still flat in that anything can resolve anything, but defining a module is a process that can possibly be tied to a loader and itself chained in a way that avoids conflict by considering a module uniquely identified as both the id of the module, and the loader that defined that module, to allow chaining at this level either through a source-based hook or an instantiation-based hook. Node.js core builtin APM applies similarly here.

bmacnaughton commented 4 years ago

@bmeck et al, after working with https://github.com/bmeck/node-apm-loader-example things have become more clear to me. Thanks for putting that together.

It seems that either of two methods to patch node-builtin modules will work for us.

  1. just require the builtin-module from within the resolve-hook and let our existing cjs patching mechanism handle it.

  2. require a "proxy" module (like node-apm-loader-example) that requires the node-builtin module. this allows additional manipulation of the builtin module. but it's not clear why the additional manipulation wouldn't be done by the existing code that patches the builtin modules.

In both cases the same object is returned by require and import; the two caches appear to be kept in sync.

For non-builtin modules I don't think there is a simple answer as they depend almost entirely on what the package author chooses to do.

While there seem to be transitional benefits to allowing dual-resolution it seems to create a great deal of complexity that would best be solved by package authors just publishing an esm wrapper around the cjs core (so that the esm wrapper requires the cjs core and exports what they want to). They can get rid of the cjs core at some point in the future if they choose to.

One challenge for APM is that it's not only the specifier that needs to be patched in all cases. For example, with mysql >= 2.0.0, the files that need to be patched are mysql/lib/Connection.js and mysql/lib/Pool.js. mysql/lib/protocol/sequences/Query.js is required to patch Connection.js. Keeping track of what has been and hasn't been patched becomes more challenging. We currently keep a WeakMap of module's that have been patched but that will no longer suffice because the require-patching mechanism we use won't see imports. And with dual-resolution they module returned by require and the module returned by import might not be the same object.

I can work with what is there (thanks again to all that have worked through this) but definitely vote for avoiding unnecessary complexity in the facility itself. It's hard to anticipate what package authors will do until they actually release their packages.