proposal: Staged require() with lifecycle hooks

Qard commented 7 years ago

I'm still working on this, but I think it's time to get some more eyes on it.

This is the EPS continuation of my work in https://github.com/nodejs/node/pull/12349 to attempt to enable an AST transformation pipeline in Node.js core. There's definitely some stuff missing still, and I'll add more over the weekend, but this is where I'm at today. 😸

@nodejs/diagnostics

sam-github commented 7 years ago

I wonder if a simpler approach of only a single hook would be better, but with Module offering APIs that can be used to implement that hook - APIs like _resolveLookupPaths() (but doced).

The 3-part API reminds me a bit of the mid-layer anti-pattern: https://lwn.net/Articles/336262/

Its very modelled on current behaviour, but if I wanted to implement a resolve like that used in https://github.com/zeit/pkg, I might want to completely replace all 3 stages with a different implementation, and would want just one hook, and then have a library of useful methods (resolve, load, parse, rewrite, ...) that I can use if I want, but no have the assumption that those 3 stages always occur baked into the extension hook API.

^--- not a rejection, just raising the possibility of a different approach that might be worth considering

cc: @watson @nodejs/diagnostics

Qard commented 7 years ago

Yep, I had similar thinking. Hadn't thought yet about how to encode it into the proposal, but basically my thinking was just that the three-stage thing would be what the internals does, but higher levels could be wrapped too, if that's more suitable.

sam-github commented 7 years ago

It might be clearer to remove refs to AST, because no AST is exposed anywhere here. APMs would have to use esprima (or just regex or text substitution) to directly rewrite the js source, whether they do that by transforming to AST, modifying, and reencoding as js is an implementation detail if I understand the proposal.

I am generally warming to the idea of exposing hooks into the module require/import system, but also worried a bit about tying the hooks too closely to what we do today, which may be counter productive if the intention is to enable extensibility into future innovative use cases.

sam-github commented 7 years ago

@igorklopov would a proposal such as this help with pkg in implementing a loader from the deps compiled into the executable?

Qard commented 7 years ago

Ok, I rewrote most of the doc and added an alternative implementation. (I kind of prefer the alternative, to be honest)

sam-github commented 7 years ago

Still looks to me like the mid-layer anti-pattern: it assumes a 4-stage life-cycle, so makes this inflexible when it could be flexible.

What about something more like:

const m = require('module');

m.register(tsRequire);

function tsRequire(name, parent) {
  // name is the module name passed to require, parent is the Module doing the require
  // return value is a Module

  if (!name.test(/\.ts$/)) return;
  const found = m.resolveRelative(name, parent); // could look in a zip file or on a web server, or ...
  const src = fs.readFileSync(found);
  const xfrm = ts.compile(src);
  return  m.whatever(xfrm, parent) // however given js source one creates a Module to return
}

So there is only one hook, its given what require always knows (name and parent), there are some library functions it can use to do its job (essentially the internal functions that are used by require now), but it can do something completely different if it wants to, as long as the end result is "here is a Module for that name".

Qard commented 7 years ago

The problem with that approach is that you can't reasonably intercept the data between the stages. If, for example, I wanted to apply another transform to that code after the ts transpile, I'd have to monkey-patch the ts module itself along with the register function to ensure it only applies my extra transform when using ts through register handlers.

sam-github commented 7 years ago

@Qard Right, I see your point, and take back my suggestion.

Qard commented 7 years ago

I definitely get where you're coming from on wanting a more composable interface though.

Where the proposal is at now tries to balance enough power to cover the particular set of needs we know we have now without sacrificing too much in complexity and potentially performance burden.

I'm definitely open to suggestions on how to improve the proposal. It is an EPS after all.

sam-github commented 7 years ago

At this point, I think we need more feedback from prospective users.

mhdawson commented 7 years ago

Nice write-up with a good overview of the rational. @sam-github do you have suggestions on who specifically we should ask for feedback ?

bmeck commented 7 years ago

If I get time, I would like to review it. That is unlikely prior to June though :(

sam-github commented 7 years ago

@mhdawson people who are working in the target use-cases, to quote the proposal:

Tracing, debugging and monitoring
Code coverage
Mocking
Transpiling

So, APM implementors (new relic, appdynamics, opsbeat, appmetrics, etc), someone from Istanbul (are there other coverage implementations?), mocking... no idea, babel/typescript/...

novemberborn commented 7 years ago

AVA and (nominally) nyc/Istanbul maintainer here. Thank you @Qard for raising this proposal, we've ended up using packages like append-transform and caching-transform which required a lot of effort to make work correctly. It'd be great to have official support from Node.js itself, especially as ES modules come into play.

nyc has a use case where it needs to apply its instrumentation transform last. This is tricky though, since nyc bootstraps itself in child process so that it's loaded first, and it is not typically aware of say a user-supplied babel-register hook. append-transform goes to great lengths to achieve this.

Loader objects should make this easier, since it means there's only one list who's order has to be controlled. Ideally the require.loaders property would not be configurable or writable. Perhaps it could be a custom collection that controls where in the pipeline a loader is inserted and that prevents arbitrary mutation.

jkrems commented 7 years ago

One note about the API: In JS module land, it will also be valuable to control linking. That's the mechanism that would still allow "monkey-patching" (since source transforms have quite a few downsides re: access to dependencies).

bmeck commented 7 years ago

@jkrems you need to be careful here, as interrupting linking generally removes exports from being live. There is no notification when exports update and no getter like mechanism upon accessing an import.

jkrems commented 7 years ago

@bmeck Interesting! The whole "when is a binding live live" thing still trips me up. What I was thinking was something along the lines of:

// original.js
export function f() {}
export function g() {}

// monkey-patch.js
export { g } from '::magic'; // or whatever that syntax is
import { f as originalF } from '::magic';
export function f() {
   // Do typical monkey stuff
   return originalF.apply(this, arguments);
}

I definitely have to do more experimentation/reading and it fails terribly if you don't have a full list of the original module's exports (and there currently is no API exposing that info).

bmeck commented 7 years ago

@jkrems also, your example is using a fn, which is safe-ish since it evaluates on each invocation, but things like:

export let now = Date.now();
setInterval(() => now = Date.now());

where you can't execute code upon access are the real problems.

Note: v8 has rejected implementing getters for variable access due to a number of reasons.

matthewloring commented 7 years ago

How will these lifecycle events work with the module cache? Would you expect to do the module cache lookup after resolution as is done currently and if so would only the resolution event be emitted on cache hits?

watson commented 7 years ago

@matthewloring Personally I'd expect the cache to be populated with the output of the pipeline. So the cache would then continue to function just as today. I can't currently see a use-case where you'd like any of the hooks to fire if the module is being loaded from the cache... but I might have over looked something

matthewloring commented 7 years ago

@watson The cache is currently keyed on the result of the resolution step so I think the user provided resolver would still be run before the cache lookup could occur. The use case that comes to mind for this would be module wrapping where you want the wrapper module to be given when the parent is not the wrapper module itself (which needs a handle to the module being wrapped). The cache does not currently take the parent into account to my knowledge which could interfere with this behavior.

Qard commented 7 years ago

I made a few revisions. The resolve hook should always run, even when there's a cache entry, since the cache entries are currently structured as resolved paths for keys.

@matthewloring That should be possible by looking at the parent property on the ModuleRequest.

matthewloring commented 7 years ago

Ok, that sound good. It does exclude applying different source transformations based on the parent module but I'm not sure how compelling that use case is.

mcollina commented 7 years ago

I do not think we should expose this API through the standard require function. Let's attach it somewhere more "hidden", as it is a relative advanced topic, if this lands.

I think we should state that this new API will not have any slowdown in load time if it is not enabled. Node.js fast boot time is one of its main features, and we should keep that. Moreover, it should be fast even it is enabled, as there should be little overhead in passing the data.

Given that ES6 modules are loaded at a different time compared to the standard require. ES6 modules are loaded during parsing, rather than at runtime. Could this transformation pipeline work with ES6 modules? In that case, would we be able to implement something like http://npm.im/proxyquire for ES6 modules?

BridgeAR commented 6 years ago

There was no progress for a long while here and I do not see any conclusion. Is there any progress here?

Qard commented 6 years ago

I'm planning on revisiting this soon and making some updates related to supporting es modules. :)

Trott commented 6 years ago

Closing, but feel free to re-open or move to a more appropriate/higher-visibility repository.

nodejs / node-eps

proposal: Staged require() with lifecycle hooks #56