Pluggable Loaders to support multiple use cases

MylesBorins commented 6 years ago

As #81 is getting into implementation details of the loader I thought it might be a good idea to start discussing an idea I have been kicking around for a robust solution. "Pluggable Loaders". This is heavily inspired by https://github.com/nodejs/node/pull/18392

TLDR; (bluesky)

Loader Plugins can be specified via an out of band mechanism (a.k.a. package.json or command line argument).
Loader Plugins can included within a project as an es module or installed via a package manager
Loader Plugins can specify the behavior of the loader including
- behavior for specific extensions or mimetypes
- behavior for lack of extensions (e.g. path searching)
Loader Plugins can cascade
- plugins can override default behavior

Example loaders include

std/esm
TypeScript
wasm
Babel

Questions to be answered

Is this approach of interest?
Should plugins be specified per application or per package
What should node ship as default loaders?
- Proposal
- Package name map as default loader
- "Legacy" loader shipped in core as well

Thoughts?

ljharb commented 6 years ago

The problem with that story is that it doesn’t satisfy all the use cases under “transparent interop”, for which import ‘cjs’ and require(‘esm’) are a necessity. I don’t think it’s productive or helpful to suggest implementations that disregard use cases that members of the working group find critical.

ljharb commented 6 years ago

What we have is a list of use cases - we should first establish which, if any, the current implementation does not support - and then compare any other implementations on that basis. Instead of arbitrary issue thread comments, perhaps a document that makes a matrix of use cases vs implementations (linking to impl details separately, since those aren’t as important yet), would help us contrast which implementations serve the most use cases?

GeoffreyBooth commented 6 years ago

I don’t think it’s productive or helpful to suggest implementations that disregard use cases that members of the working group find critical.

@ljharb I think that’s a bit too high of a bar for proposals. I think people suggesting ideas is helpful, even if a proposal is incomplete or doesn’t cover every use case. Yes, we should follow up each proposal with an evaluation of which use cases are unsatisfied, and that can drive revisions of the proposals, but I wouldn’t want people to feel like they couldn’t throw ideas into the mix.

@MylesBorins I still don’t understand “Step 5 is about what happens when those dependencies change from being cjs to esm”. Even when dependencies add ESM support, most will probably be dual-mode ESM/CommonJS for a long, long time (probably longer than most packages will be actively maintained) because packages won’t want to drop support for Node LTS until the oldest LTS supports ESM. So for most dependencies, import.meta.require will likely work indefinitely. Is that what we really want, for users to basically refactor their module import statements into import.meta.require and leave their code as the latter basically indefinitely?

Most apps, in the near term at least, will end up being import statements for all user code (like my app importing its own files) and import.meta.require for all dependencies (since all will support at least CommonJS, and will for years to come). That would become a de facto new standard pattern for Node apps, to the point that naive users will just assume that Node requires import.meta.require for importing dependencies (and people will wonder why that is, will think that Node didn’t fully add support for the import statement, etc.). And there won’t be any reason for users to ever refactor import.meta.require into import; if anything, users will avoid doing so, as that’s a change that could potentially introduce bugs while it provides no benefit.

GeoffreyBooth commented 6 years ago

@MylesBorins Can we get back to the loader part of your proposal? Can a loader be written that lets people avoid needing to refactor their code/use import.meta.require?

MylesBorins commented 6 years ago

@GeoffreyBooth I'm imagining a loader can be made that would allow people to avoid import.meta.require. Currently thinking that this should not be the default loader but open to discussing all the thing

@ljharb fwiw this proposal as it currently stands is not complete by any means... it also doesn't cover the case of native modules. I don't think every proposal will be able to solve every use case. The idea with this proposal was that we could handle a baseline use case and create a platform to help support the ecosystem solve ones that core cannot

Making a small edit above

s/default loader uses the package-name-maps proposal for resolution/default loader supports the resolution mechanism of the package-name-maps

This is to imply that we do not necessarily require the name map, this can be generated at runtime.

GeoffreyBooth commented 6 years ago

Yeah sorry to zero in on the one thing I found objectionable 😄 In general I think the idea of loaders holds a lot of promise, as a way to enable non-default behavior or even deviations from the spec in an opt-in way.

I think we need to keep in mind “how would you explain this to a layperson” with regard to things that are deviations from current user practice. Like if we say “yes, you can do import { x } from 'lib' but only when lib is ESM,” the Node repo is going to get flooded with bug reports/feature requests like “my import statement doesn’t work”. I understand that these users will just be wrong to complain about intended behavior (depending on how this all shakes out), but that’s going to be the reality of the situation. I understand too that this frustrates @bmeck and @devsnek 😄 but the JavaScript community is a big community, and a huge number of those folks are very novice programmers. We have to be able to not only explain that the syntax doesn’t work for CommonJS, but why that’s so, and why that’s not a bug nor something that we’ll add support for in the future. And this is just an example, you could replace import { x } with import.meta.require or any other implementation we want to release that deviates from what users expect based on what they do now with Babel.

So yes, if there are tools we can add to our toolkit that let us do end runs around some of the blockers we’ve run into regarding things “just working,” I get excited by that potential. Especially if those tools can be training wheels that ease a transition to an all-ESM, fully-compliant world whenever that day comes, and also if these aids can be added/applied to a project in an automatic or semi-automatic way.

bmeck commented 6 years ago

@GeoffreyBooth loaders don't let you change how ESM works, but do allow you to rewrite the code. They could let you transform your code into another form, but if that transform is shipped with Node needs to be very clear about what it does and isn't free. Since loading CJS still is required to load as a binding for ESM, you need a signal that a dependency is CJS (we have this with the format property in hooks currently), then you need to rewrite all references to the import names from that module to be some form of delegation. That delegation would also need to be standardized but can probably rely on the behaviors in https://github.com/nodejs/node/pull/20403 .

However, even with all of this effort you won't get around the need to know the list of exported names you are wanting to create for a CJS module. The only way to allow that I know of is to take one of the 3 described solutions in https://github.com/nodejs/modules/issues/81 . Loaders don't get around those issues.

iarna commented 6 years ago

I get concerned when I see:

Loader Plugins can be specified via an out of band mechanism (a.k.a. package.json or command line argument).

Out of band is great, but I want to make sure that some facility for setting these in a js file that doesn't require a second Node.js process is also supported from the start. "use vm to create an independent javascript context using these loaders" would be an acceptable solution, IMO, and arguably still counts as "out of band".

MylesBorins commented 6 years ago

@iarna that is a great idea! fwiw this is meant to be a starting point and welcome additional features about how to do this in a scripted way.

bmeck commented 6 years ago

@iarna have you seen the vm.Module work by @devsnek, it lets you specify linking and could be used as a wrapper, but doesn't require you to make a new context.

iarna commented 6 years ago

@bmeck I haven't! I'm not particularly picky here, so long as there's a facility. (I feel similarly about import.meta.require, ugly as sin, sure, but it arguably should be.)

GeoffreyBooth commented 6 years ago

@MylesBorins I had some thoughts about how to use loaders to try to stitch together an implementation that satisfies many of our features; should I describe it here or start a new issue?

MylesBorins commented 6 years ago

Feel free to describe it in here

On Thu, May 24, 2018, 1:29 AM Geoffrey Booth notifications@github.com wrote:

@MylesBorins https://github.com/MylesBorins I had some thoughts about how to use loaders to try to stitch together an implementation that satisfies many of our features; should I describe it here or start a new issue?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/nodejs/modules/issues/82#issuecomment-391593129, or mute the thread https://github.com/notifications/unsubscribe-auth/AAecVycExz97AIUjPJoc_lydCtCBVGasks5t1kVJgaJpZM4T-i6Q .

GeoffreyBooth commented 6 years ago

@MylesBorins So here’s what I’m thinking. Not a concrete proposal yet, just some general thoughts. It’s obvious that a lot of people want interoperability in some form, somewhere on the spectrum from fully transparent to fully opaque, however people want to define those terms. It seems to me that a lot of the debate in this repo has revolved around how to get Node to support various interoperability features natively, by default, without breaking spec or other concerns. Loaders help us bridge the gap, by supporting these interoperability requests while allowing Node’s native, default implementation to keep fully to spec.

I’m sure that some folks are tempted to design Node’s new modules implementation as if CommonJS never existed. It could be a direct copy of how browsers do things, and would have the cleanliness of a blank slate. Whenever the day comes that CommonJS fades from the scene, Node would already have an easy-to-maintain, streamlined modules implementation. To handle the transition period of the next few years, either people keep on transpiling like they do now; or one or more loaders handle all interoperability with CommonJS.

The opposite end of the spectrum would be to get Node to support every interoperability case we can think of, natively and by default, and either break spec (possibly by asserting that it doesn’t apply to these cases) or by going back to the committee for amendments to the spec to enable what we want to do. This would make the transition period easy, but would leave Node with a modules implementation even more complex than it is now.

A middle ground would be to have Node natively support all the interoperability cases that can be handled without breaking spec, and rely on loaders only for the edge cases where we just don’t think we can achieve the feature that’s been requested without violating the spec (or offending some other high priority concern, like browser equivalence). This is basically what experimental-modules and the NPM implementation are doing now, though without loaders. Add loaders to either one, to cover the interoperability features they don’t yet support, and we’re done.

If it were up to me, I would pick this middle ground approach. It doesn’t sound like this group would ever agree to the “break/ignore spec” approach, and the “use ESM as an opportunity to greenfield modules in Node” approach ultimately means that whatever massive CommonJS loader gets written becomes a de facto part of Node for the next several years, until most projects have no CommonJS dependencies. That could be a long, long way off.

Whereas if Node supports as much interoperability as it can, then these compatibility legacy loaders could fade out sooner. For example, let’s say a project only needs a loader to support static named exports from CommonJS, e.g. import { shuffle } from 'underscore'. The user could decide to rewrite all such lines like import _ from 'underscore', or use a transpiler to do so as part of a build step, and then the loader is gone. People who are transpiling anyway, because they want to use even-newer ES features or they’re using TypeScript or CoffeeScript etc., would probably prefer to handle this in a build step that they already have in order to avoid needing a loader in runtime. If most interoperability concerns are handled by Node without opt-in loaders, many projects won’t need a loader; and many other projects will be able to drop loaders sooner, as we approach an all-ESM future.

The tricky part is that we don’t know precisely what features can’t be part of Node proper and need to be implemented as a loader. Even the example I wrote above, import { shuffle } from 'underscore', has inspired debate among people who think it can be achieved without breaking spec. So maybe the way to get from here to there is for one group to go forth and try to write an implementation that achieves full transparent interoperability, with as many of the features in this repo as possible, without breaking spec (or at least, without breaking spec as far as they’re aware). Another group can play “red team” and point out places they’re breaking spec or deviating from browsers, and that code can be excised and reimplemented as one or more loaders. I’m assuming that the first team isn’t ultimately able to produce an implementation that satisfies all feature requests yet somehow passes spec review—but if they can, that would be great! But assuming they can’t, that would lead the way to creating a Node modules implementation that handles as much as it can, with loaders to cover the rest.

naturalethic commented 6 years ago

@devsnek You mentioned 'hot reloading' in the second comment in this thread:

this is pretty much possible with the current system's hooks. using a combo of resolve and dynamic instantiate you can do everything from hot reloading to babel-style cjs named exports.

I have found no mechanism for invalidating / reloading a previously loaded module. Can you direct me to any references on that?

ljharb commented 6 years ago

You’d have to set up hot reloading prior to a module being imported, just like with CJS.

bmeck commented 6 years ago

@naturalethic you can't completely invalidate ESM due to spec idempotentcy requirements. It is just as @ljharb said that you have to make a module that does delegation to the currently "hot" module instead of replacing module records.

Edit:

A crude example of a module that does delegation that only works for new calls to import() and does not work for static import / rebind existing values:

export function then (r) {
  r(import(await getHotModuleURL(...)));
}

naturalethic commented 6 years ago

@bmeck & @ljharb Thanks!

TheLarkInn commented 6 years ago

One thing I'd like to offer is leveraging an already vetted, tried, and completely extensible resolver that is async and leverages the default node algorithm oob.

https://github.com/webpack/enhanced-resolve

bmeck commented 6 years ago

@TheLarkInn it would be good to support such a loader, but I think the discussion is more around how to enable essentially what the plugins field does in your example. enchanced-resolve is very complex and not something I'd like to take on from a support point of view. I think it is sane for us to have a more minimal API and allo people to call enhanced-resolve for their own purposes within any given loader.

arcanis commented 5 years ago

Most of the discussions here have been focused on the specific use case of using the loaders to transpile the code in some capacity. I have a quite different use case in mind, so please feel free to ask me to post this message in a separate thread if you think it'd keep discussions more manageable.

We recently unveiled Yarn Plug'n'Play, whose goal is to generate static maps that Node would then be able to consume. Since Yarn knows everything about the dependency tree, including the location of the packages on the disk, it makes sense to fetch information directly from it - and it makes it possible to avoid creating the node_modules folders, amongst other benefits (whitepaper here).

In order to achieve this, we currently require our users to use the --require option to inject the Yarn resolver into Node. To make this a bit easier and more environment-agnostic we've also introduced yarn node that simply wraps Node and automatically injects this flag if needed. That's not great for a variety of reasons (the main one being that we don't want to substitute to Node), and as such we'd really like to find a way to tell Node that a loader must be used in order for a project to work. Using a per-project settings rather than a per-process one, so that users wouldn't have to change their workflows one bit.

All this to say: the loaders field described by Myles in #82#389761269 would be extremely useful in this regard. We could simply add the Plug'n'Play hook at the beginning of the array, and everything else would Just Work™. While important, transpilers aren't the only kind of project that would benefit from this API.

bmeck commented 5 years ago

@arcanis would https://github.com/nodejs/node/issues/18233 be sufficient? We are currently unable to land it if we were to PR it, but just check the design for now and see if anything is missing. With resolve(module, specifier) in place you could have a static mapping still.

guybedford commented 5 years ago

@arcanis the difficulty with this model is that it becomes very tricky to distinguish between a loader that is necessary for a package to work and a loader that is necessary for a specific project to work. For example, I might in development use Yarn Plugn'n'Play, then publish that package to npm, with the same plugin and play loader declaration in the package.json. Now anyone installing this package would get the Yarn loader applied, even if they were consuming the package via node_modules with npm.

So this is the current snag here on loaders, making this distinction clear and simple. Suggestions very welcome here!

arcanis commented 5 years ago

@bmeck I don't think it would be enough unfortunately, because of the distinction @guybedford mentioned - your PR adds support for per-package hooks, but in the case of PnP the hook must be registered globally (since all packages will need to use it in order to resolve their own dependencies).

The use case would be covered by the "Global Composition" section, except that it would be a production loader, not development only. Using the environment to set the global hooks is nice for debugging, more impractical for production (it doesn't remove the need for yarn node, since the environment has to he set in some way; it also poisons child processes environments).

@guybedford What about a globalLoaders option that Node would use from the closest package.json in the filesystem hierarchy (relative to the script being executed, or the cwd in the case of -e / -p), and would ignore after the program has started (using the loader field from @bmeck's PR instead)? 🤔

bmeck commented 5 years ago

@arcanis I would be against adding non-package (application level) data to package.json

What is exactly the reason that this cannot be done on a per package level?

NODE_OPTIONS="--loader=@yarn/pnp" is not sufficient I take it then as well? Couldn't you turn off the propagation to children by removing the part of process.env.NODE_OPTIONS that is setting --loader?

guybedford commented 5 years ago

globalLoaders sounds like a very useful concept to me. And it would also directly benefit the way jspm wants to apply loaders as well.

devsnek commented 5 years ago

if your individual package depends on yarn pnp it should still be package level. i don't want my deps being handled by some other package randomly.

arcanis commented 5 years ago

@bmeck Some reasons why I think NODE_OPTIONS might not be the right solution:

Setting NODE_OPTIONS is an explicit action. The user has to either set it themselves, or let a tool do it for them.
In the first case, they have to either set it system-wide (which is not acceptable, since some projects might require the hook and others might not), or temporarily either in their shell session (meaning they have to remember to remove it later) or the single command line (which makes the command lines much more verbose and hard to write/grasp).
In the second case, they have to setup something that will create the NODE_OPTIONS as needed. This something may be a shell alias, or yarn node, but whatever it is it will put itself between Node and the user, adding an extra layer of complexity (for example, yarn node requires at least one extra process to spawn, including a full Node context that would run the yarn binary simply for eventually shelling out to the real Node - and it can be worse depending on the setup). It also doesn't compose very well with other loaders (will each global loader have its own wrapper).
It will poison the environment for child processes. While it could be somewhat fixed by removing the part of NODE_OPTIONS that sets --loader, as you mentioned, it's not a working solution, because there's no way to know what added this flag in the first place. The intent is completely lost: is the loader meant to been preserved? or removed? (as a side note, it also requires to parse the NODE_OPTIONS string, which comes with its own problems)
What happens if you use fork to spawn a new Node process in a directory installed through Plug'n'Play? If Node doesn't have an integration to find out by itself the global loaders that need to be loaded, it means that the code doing the fork will have to implement it. Will Plug'n'Play (or esm, or babel/register) have to monkey-patch fork to support the use case? Same thing for execSync.

if your individual package depends on yarn pnp it should still be package level. i don't want my deps being handled by some other package randomly.

Plug'n'Play is enabled on an application-basis, not package-basis. You cannot have an individual package depend on PnP (hence why globalLoaders would be ignored for anything else than the running script).

bmeck commented 5 years ago

@arcanis all of those are reasons why I believe that it should be done on a per package level. You state that it is enabled on an application-basis, but why can it not be enabled on a per package basis? Most packages when you install them don't come with node_modules populated and loaders let you actively opt-out of node_modules

zenparsing commented 5 years ago

@arcanis For your use case, are there multiple entry points into the application that need the same loader-hook treatment, or is there generally only one entry point?

arcanis commented 5 years ago

You state that it is enabled on an application-basis, but why can it not be enabled on a per package basis?

@bmeck There are a few reasons:

Something that might not be clear from my previous comments is that Plug'n'Play causes a single Javascript file called .pnp.js to be generated where would typically be found the node_modules folder (which isn't created at all). This file is the loader that needs to be injected. This is the one and only source of truth to know where are located the packages on the disk - it contains everything needed for Node to say "package X is requiring file Y - then it needs to be loaded at location /path/to/Y" - and this anywhere on the dependency tree.
Packages (most of them downloaded directly from the npm registry) have no idea whether they're used under PnP or regular installs. Nor should they have to: it wouldn't be feasible to ask for all package authors to add the PnP loader to their package.json - and it would break for people not using PnP. No, the packages should be generic, and it's to the loader to work whatever is the package configuration.
Even if they were aware of whether they run in a PnP or non-PnP environment, the loader path cannot be known on their side (and writing it inside their respective package.json files at install-time isn't feasible, since the same package folder will be used by multiple .pnp.js for multiple projects).

In case the reason why per-package configuration wouldn't work, I recommend taking a look at the whitepaper - I think it might help clarify some points that can still be confusing, especially the Detailed Design section.

For your use case, are there multiple entry points into the application that need the same loader-hook treatment, or is there generally only one entry point?

@zenparsing It's up to the user, they can have as many as they want, and it cannot be statically determined since they can new ones after the install. Basically, each time they would usually run a script using node ./my-script.js, they'll need to register the loader present in this folder.

bmeck commented 5 years ago

@arcanis I did see the design, but I'm still unclear on why it needs to be that way. Maybe we should setup a call. I'm not sure but it seems like there is some confusion or I'm missing something. I agree that the current whitepaper/RFC would not be an ideal fit for either global or per package loaders. I'm interested in solving the use case, but if we must perfectly match that RFC it seems unlikely to be pleasant.

Something that might not be clear from my previous comments is that Plug'n'Play causes a single Javascript file called .pnp.js to be generated where would typically be found the node_modules folder (which isn't created at all). This file is the loader that needs to be injected. This is the one and only source of truth to know where are located the packages on the disk - it contains everything needed for Node to say "package X is requiring file Y - then it needs to be loaded at location /path/to/Y" - and this anywhere on the dependency tree.

So can't a Loader just find that file/generate it as needed? A package manager could even make a big single shared .pnp.js that works across all packages. I don't see how this really relates to needing it to be a application level option. I must be missing something.

Packages (most of them downloaded directly from the npm registry) have no idea whether they're used under PnP or regular installs. Nor should they have to: it wouldn't be feasible to ask for all package authors to add the PnP loader to their package.json - and it would break for people not using PnP. No, the packages should be generic, and it's to the loader to work whatever is the package configuration.

That sounds like you want a global hook, but as described in both the comments and a few notes in the RFC this would be a breaking change to use this resolution algorithm. Wouldn't that mean that users should opt into this behavior? And if they should opt into this behavior how would they do so to state they are using non-standard behavior if not on a per package level.

Even if they were aware of whether they run in a PnP or non-PnP environment, the loader path cannot be known on their side (and writing it inside their respective package.json files at install-time isn't feasible, since the same package folder will be used by multiple .pnp.js for multiple projects).

I don't understand this comment, why can't you have either linkage via symlinks like pnpm or use the default resolution algorithm to get to your loader?

arcanis commented 5 years ago

if we must perfectly match that RFC it seems unlikely to be pleasant.

My goal in being here is to help make this pleasant to everyone. If the RFC has to consider new facts, so be it. I want to stress that we're really open to feedback and don't want to push this on anyone 🙂

So can't a Loader just find that file/generate it as needed? A package manager could even make a big single shared .pnp.js that works across all packages.

I'm not entirely sure what you mean - a loader cannot generate that file, since it is meant to be generated by the package manager. Yarn already does make a big single single shared .pnp.js file that works across all packages, so I'm not sure either I understand correctly.

If you mean "accross all projects", this isn't possible - multiple projects have different resolutions (different lockfiles, in sort) that cannot be merged, and the .pnp.js is meant to be checked-in anyway.

this would be a breaking change to use this resolution algorithm. Wouldn't that mean that users should opt into this behavior? And if they should opt into this behavior how would they do so to state they are using non-standard behavior if not on a per package level.

There's a few points here that can be discussed (maybe it'd be better to mention it on the rfc thread, since people here might not be interested about Plug'n'Play's details?):

First, something to note is that Plug'n'Play aims to be compatible with the existing ecosystem.
Which means that whether Plug'n'Play is enabled or not should not affect the packages.
Which in turn means that packages shouldn't have to be aware of whether PnP is enabled or not.

How do we achieve this compatibility? By strictly following the rules of dependencies / devDependencies. It covers almost all needs. The only want it doesn't cover is obviously packages directly accessing the node_modules folder, but the fix is usually quite simple, thanks to require.resolve. Anyway, the main point is: operating under PnP or not is a setting that is done at the installation level, not the package level.

I don't understand this comment, why can't you have either linkage via symlinks like pnpm or use the default resolution algorithm to get to your loader?

Neither symlinks nor the node resolution would solve the problem. Consider the following hierarchy:

/path/to/cache/lodash-1.2.3/package.json -> {"loader": "???", "dependencies": {"left-pad": "*"}}
/path/to/cache/left-pad-1.0/package.json -> {"loader": "???"}
/path/to/cache/left-pad-2.0/package.json -> {"loader": "???"}

/path/to/project-1/my-script.js -> require(`lodash`)
/path/to/project-1/.pnp.js -> lodash=lodash-1.2.3, left-pad=left-pad-1.0

/path/to/project-2/my-script.js -> require(`lodash`)
/path/to/project-2/.pnp.js -> lodash=lodash-1.2.3, left-pad=left-pad-2.0

What would you put in loader that would simultaneously target both project-1/.pnp.js and project-2/.pnp.js, depending on the environment? Note that one of the goal of Plug'n'Play is to avoid I/O, meaning that creating symlinks in project-1 and project-2 isn't allowed (and it would require --preserve-symlinks anyway).

bmeck commented 5 years ago

@arcanis I'm getting quite confused with a lot of implementation of how your system works right now being thrown around. I'm going to just state how I would expect things to look given what I'm understanding:

/path/to/cache/foo@1.0.0/package.json -> {"loader":"./.pnp.js"}
/path/to/cache/foo@2.0.0/package.json -> {"loader":"./.pnp.js"}
/path/to/cache/foo@2.1.0/package.json -> {"loader":"./.pnp.js"}
/path/to/cache/bar@1.0.0/package.json -> {"loader":"./.pnp.js", "dependencies": {"foo":"2.0.0"}}
/path/to/cache/bar@2.0.0/package.json -> {"loader":"./.pnp.js", "dependencies": {"foo":"^2"}}

/path/to/project-1/app.js -> require(`foo`) require(`bar`)
/path/to/project-1/package.json -> {"loader": "./.pnp.js"}
# unclear how this handles the `foo@2` nesting?
/path/to/project-1/.pnp.js -> foo=foo@1.0.0, bar=bar@1.0.0

/path/to/project-2/app.js -> require(`foo`) require(`bar`)
/path/to/project-2/package.json -> {"loader": "./.pnp.js"}
# no nesting, simple
/path/to/project-2/.pnp.js => foo=foo@2.0.0, bar=bar@1.0.0

/path/to/project-3/app.js -> require(`foo`) require(`bar`)
/path/to/project-3/package.json -> {"loader": "./.pnp.js"}
# no nesting if bar using foo@2.0.0
# nesting if bar using foo@2.1.0
/path/to/project-3/.pnp.js => foo=foo@2.0.0, bar=bar@2.0.0

Given that .pnp.js can intercept all incoming requests for dependencies within a "project", we can ensure that it loads to a location that properly does the requirements in the RFC.

receive a resolve(moduleId (generally a URL), specifier) request.
create body of /path/to/cache/... and assigned id.
map assigned id to /path/to/cache/... resolutions so that resolve(...) can handle it.
respond with body

This could be done in a ton of other ways, the pnpm style symlinkscould be used, and a loader would prevent the need for using--preserve-symlinks` if it handled that itself.

We also face some problems from taint if we use methods of finding /path/to/cache/... like os.homedir() if someone is using things like child_process.fork, cluster, etc. if the flags are removed / if they set a different user id. Having /path/to/cache be local and resolvable without outside data would be ideal. This is part of why pnpm style symlinks are attractive. These problems can also be problematic if we use CWD and try to run /path/to/project-1/ in a different working directory from /path/to/project-1/ such as using a CWD of /tmp/pid1000.

We also face some problems of creating a distinction of application and package here. If I were to require('/path/to/project-1') and it acted differently from node /path/to/project-1 we face some problems regarding how to determine if something is running as an application or as a package. Using module.parent style detectiong is insufficient since it could be loaded in odd ways:

$ # using the repl
$ node
> require('/path/to/project-1')

$ # as a preload
$ node -r /path/to/project-1 sidecar

$ # bad APIs
$ node -e '...;require("module").runMain(...)'

I have serious concerns in creating systems that create this divergence since it makes behavior very dependent on how things are used, which we don't currently have well defined paths for a variety of things in ESM. Things like module.parent don't even make sense in ESM.

It seems like if you really need that distinction we should clarify what an application is far beyond having it mean that something is running via a specific CLI command. I feel like such a distinction might need to ensure that an application cannot be used as a package and vice versa.

arcanis commented 5 years ago

I must be missing something: how come the loader fields in the cache packages reference ./.pnp.js, but there is no .pnp.js file in the cache packages? Wouldn't those relative paths resolve to /path/to/cache/foo@1.0.0/.pnp.js, which doesn't exist (and cannot exist since it contains project-specific data that cannot be stored in the cache)?

bmeck commented 5 years ago

@arcanis i would assume they exist in the cache as well, i am still unclear on what is project specific hence wanting a meeting.

arcanis commented 5 years ago

Some highlights from my understanding of what we discussed (@bmeck, @MylesBorins, please correct me if I'm mistaken somewhere):

The resolution process should be split in two steps:
- First a pass would resolve the bare imports into a value,
- Then during the second pass the loader would take this value, finish resolving it if needed, and finally load a module using it.
- Taking a JSX loader as example: react-calendar would resolve to the unqualified resolution /path/to/cache/react-calendar@1 during the first step, then the second pass would kick in and would internally turn this path into a fully qualified resolution, turning it into /path/to/cache/react-calendar@1/lib/calendar.jsx, which it would finally load.
Since the PnP has a global knowledge of all packages of the dependency tree, it needs to affect all packages being loaded
- Still, this is complex to achieve by pure application-level loaders, because it's not clear what is an application.
- This can be achieved by a package-level loader that would wrap the loaders from all imported modules.
- Through this pattern a loader would effectively act as an application-level loader, but would require some boilerplate.
The value returned during the first pass might not be an actual filesystem path
- An Asset API (similar in concept to FS, but read-only) would then allow consumers (both userland code and loaders) to use these values to read the files they point to

A pseudo-implementation for what such a PnP loader would look like this (@bmeck, let me know if I made a conceptual error here). Note that I made PnP more complex than it actually is (it currently returns actual folder paths, not hashes) to illustrate in the next snippet the use of the asset api.

import {resolveToUnqualified} from './.pnp.js';

export default function pnpLoader(parentLoader) {
  return {
    // request=react-calendar/component
    loaderStepOne: (request, issuer) => {
      // return=TZhsV3bGQV2KZIjFIObr/component
      return resolveToUnqualified(request, issuer);
    },
    // request=TZhsV3bGQV2KZIjFIObr/component
    loaderStepTwo: request => {
      const {loader, ... rest} = parentLoader.loaderStepTwo(request);
      // Wrap the loader to substitute it by our own
      return {loader: pnpLoader(loader), ... rest};
    }
  };
}

And a pseudo-implementation for the default loader would be something like this (keep in mind this is SUPER pseudo-code, we haven't discussed code samples and this is just based out of my rough understanding of how the asset api could work):

import * as fs from 'fs';

export default function defaultNodeLoader(parentLoader) {
  return {
    // Note that the PnP loader would entirely shortcut this step, since
    // it implements its own step one.
    loaderStepOne: (request, issuer) => {
      // We need to run the full resolution since the extension and index.js
      // must be resolved in order to select the right node_modules
      return runNodeResolution(request, issuer, fs);
    },
    //
    loaderStepTwo: request => {
      // If there's a parent resolver, use it to resolve the assets (since it's
      // the only one that'll know how to use the special identifier
      // TZhsV3bGQV2KZIjFIObr/component that's been returned by
      // the PnP loader)
      const selectedFs = parentLoader ? parentLoader.assets : fs;

      // Then use the FS we've selected to run the resolution; we need to run
      // it again (even if we do it in the step one of this loader), because the
      // step one is not guaranteed to return a fully qualified path (the PnP
      // override wouldn't, for example)
      const qualifiedPath = runNodeResolution(request, issuer, selectedFs);
      // without PnP = /.../node_modules/react-calendar/component/index.js
      // with PnP    = TZhsV3bGQV2KZIjFIObr/component/index.js

      // And then, once it has finished resolving the fully qualified path, it
      // can load it (still using the parent loader FS if needed)
      const body = await selectedFs.readFile(qualifiedPath);
      const loader = /*not sure how it will be loaded?*/;

      return {body, loader};
    }
  };
}

MylesBorins commented 5 years ago

Closing this as there has been no movement in a while, please feel free to re-open or ask me to do so if you are unable to.

nodejs / modules

Pluggable Loaders to support multiple use cases #82