Jack-Works commented 1 year ago

I have implemented a PoC in a Webpack plugin to support import(spec, { reflect: "module"}). I found the current reflection is too weak to support real-world use cases.

Use case

Import the untrusted npm module as a whole to provide virtualization.

Let's say we have an untrusted npm module mod like this:

// mod/index.js
import { assert } from './utils.js'
import { count } from './count.js'
import { readFile } from 'node:fs/promises'

In the current proposal, if we want to fully virtualize the mod module, we need to manually collect all the transitive dependencies of mod, for example: mod/utils.js and mod/count.js.

This is not practicable and with low performance (you need to reflect every module and manually link them together by the importHook).

Currently: the importHook will be called for ./utils.js, ./count.js, and node:fs/promises

Suggestion

The module reflection, by default, fetches and links all its transitive dependencies, except the module that is intrinsic (has no source and is implemented by the host, like node:fs).

Still the same module above, with this suggestion, it will only call importHook for all Node built-in modules. mod and it's all its transitive dependencies are evaluated in the given Evaluator.

Peer dependencies

If this mechanism does not support opt-out, it will be not usable. Some module does not allow multiple instances (like "react") otherwise it will not work correctly. We can provide an opt-out mechanism like this.

await import("react-window", { reflect: "module", reflectExclude: ["react"] })

In this way, this module reflection will not capture react and we need to manually provide it in the importHook.

Today's behavior

await import("mod", { reflect: "module", reflectExclude: "*" })

Use a "*" string instead of an array to get today's behavior. (not link anything)

Pros

Better ergonomics, easier to adopt
Performance gain (many modules can be linked by the engine. less user code, better performance)

Cons

We need to introduce the resolveHook back again in the compartment proposal.
Reflected modules have different identities if the exclude options are different.

guybedford commented 1 year ago

I agree that for the use case of "virtualize a graph", you want a construct that allows you to "pluck a subgraphgraph, down to an exclude boundary".

But note that this proposal is not solving that problem, and that such a solution conflicts with the primary use case of this proposal of providing the simplest primitive. Think how realms also requires wrapper code to get to the final use case.

In addition, in the Wasm case, having any linking on by default would break the Wasm use case being solved by this proposal.

There are also implementation cache key questions with this approach in that there is no longer a clear cache key associated with the reflection.

I would recommend, treating your PoC as library code that wraps these primitives, as opposed to changing the primitives. Compartments could possibly also offer further solutions here as part of its API.

Jack-Works commented 1 year ago

treating your PoC as library code that wraps these primitives

it is not viable when implementing it in a bundler because if it is library code, then we need to reflect the transitive dependencies in the runtime, which requires import(unknown_expression_here, { reflect: "module" }) (and link them manually). This is not static analyzable.

guybedford commented 1 year ago

Why do you need it to be statically analyzable? Loaders never have been statically analyzable.

Jack-Works commented 1 year ago

In addition, in the Wasm case, having any linking on by default would break the Wasm use case being solved by this proposal.

Opt-in is also okay for me, but can you explain why?

There are also implementation cache key questions with this approach in that there is no longer a clear cache key associated with the reflection.

They're still reflecting the same underlying source, like reflecting it as a ModuleSource or reflecting it as a string (if we're going to have it).

===================

I gathered my thoughts and this is what I was thinking about:

I need to virtualize a graph for security
I'm using Webpack so I'm writing a Webpack plugin
There are no node_modules in the runtime so I need to do some static analyzing.
I don't want to implement something unlikely to be standardized in the future. Therefore if the proposed change in this issue is unacceptable, I have to change how I solve the problem.

If I have to write import('some-module', { reflect: "module" }) with the current semantics (no transitive dependencies included), a possible solution is to compile all files that some-module refers to (in any depth) into a ModuleSource and store them in a Map.

import('some-module', { reflect: "module" })
// globalThis.#reflectedModules: Map<[["some-module", ...], ["some-module/lib/index.js", ...]]>

Then in the user land library, when I need to link it in the importHook, I can use import(unknownSpec, { reflect: "module" }) because it is already in the globalThis.#reflectedModules (in the bundler case) or just available (in the native implementation).

But this method brings another problem. This way requires me to treat all CommonJS files as a ESM file. Even if I use cjs-module-lexer to analyze and sythentic a ESM module for it, it will still have CJS/ESM interportion problem.

Currently I'm stucked here. If we can have reflected transitive dependencies in this proposal, when the graph looks like this:

(a) reflected ESM => (b) CommonJS => (c) CommonJS

We don't need to reflect (b) and (c) as a ModuleSource, and it can be executed in a host provided CommonJS runtime (uhh how importHook should interact with CommonJS in NodeJS)

guybedford commented 1 year ago

What you're describing is something like a "module closure" where the module has partial binding to other modules, which in turn have partial binding. This is a very complex concept (just like closures are for functions). WebAssembly module linking has no concept of partial linking (and in fact WebAssembly doesn't even separate linking from execution). This spec is a simple primitive designed to enable the use case (which isn't possible today), not solve the advanced linkage scenarios you describe. I agree those are problems, but they simply don't fall under the use cases or requirements of this proposal at all. I think you should bring these points up under the compartments spec instead.

Jack-Works commented 1 year ago

I solved how to reflect CommonJS modules, so I believe there is no more problem with this.

tc39 / proposal-source-phase-imports

Reflect with transitive dependencies #30

Use case

Suggestion

Peer dependencies

Today's behavior

Pros

Cons