nicolo-ribaudo / modules-import-hooks-refactor

https://nicolo-ribaudo.github.io/modules-import-hooks-refactor/
MIT License
0 stars 0 forks source link

Refactor of import-related host hooks

Proposed changes

ECMA-262 currently exposes two hooks related to modules loading: HostResolveImportedModule and HostImportModuleDynamically.

HostResolveImportedModule(referencingScriptOrModule, specifier) synchronously resolves an imported module and returns the corresponding module record. While module resolution and loading is usually asynchronous, this was a good enough abstraction for the ES2015 modules specification: before evaluating a module, hosts could pre-build the module graph before evaluating a module and asynchronously load all its dependencies. This asynchronous step was not observable from ECMA-262, whose algorithms where only run once all the dependencies where synchronously available.

When we introduced dynamic imports in ES2020, this abstraction leaked: the asynchronous loading part needed to run during the execution of other ECMAScript code, so we had to introduce the new host hook HostImportModuleDynamically(referencingScriptOrModule, specifier, promiseCapability) to give hosts the opportunity to asynchronously prepare for the synchronous HostResolveImportedModule calls.

When loading and evaluating modules, either using host-defined mehanisms such as <script> tags or when using HostImportModuleDynamically via dynamic import, the complete algorithm is divided between the host and ECMA-262:

  1. (host, potentially async) Load the module graph:
    1. (host, potentially async) Load the Module Record.
    2. (host) Get all its static dependency specifiers.
    3. (host) For each dependency, do 1.i.
  2. (host) Call .Link() on the top-level Module Record:
    1. (ECMA-262) Get all its static dependency specifiers.
    2. (ECMA-262) For each dependency:
      1. (ECMA-262) Call HostResolveImportedModule(specifer, referencingModule).
      2. (host) Get the pre-loaded module curresponding to the (specifer, referencingModule) pair.
      3. (ECMA-262) Validate that the imported bindings are actually exported.
      4. (ECMA-262) Do 2.i for the resulting module.
  3. (host, potentially async) Call .Evaluate() on the top-level Module Record.

This refactor proposal aims to revisit the layering decision made by the dynamic import proposal: rather than introducing a new hook to permit async host steps for import() calls, it replaces HostResolveImportedModule with an equivalent but async-compatible HostLoadImportedModule hook: it loads a single module, and ECMA-262 iterates through its dependencies asking to the host to load them. The updated algorithm is:

  1. (host or ECMA-262, potentially async) Load the module graph:
    1. (ECMA-262, potentially async) Call HostLoadImportedModule(specifier, referencingModule).
    2. (ECMA-262) Get all its static dependency specifiers.
    3. (ECMA-262) For each dependency, do 1.i.
  2. (host or ECMA-262) Call .Link() on the top-level Module Record:
    1. (ECMA-262) Get all its static dependencies.
    2. (ECMA-262) For each dependency:
      1. (ECMA-262) Validate that the imported bindings are actually exported.
      2. (ECMA-262) Do 2.i for the resulting module.
  3. (host or ECMA-262, potentially async) Call .Evaluate() on the top-level Module Record.

where host or ECMA-262 means "host if the algorithm is run by an host-defined mechanism such as <script> tags, ECMA-262 if it's run by dynamic import".

Motivation

This refactor has two benefits on its own: it reduces the amount of behavior delegated to the host, by taking ownership of the loading steps shared across all the hosts that use asynchronous loading.

However, it's most useful for some current proposals that introduce the concept of a "module whose dependencies have not been loaded yet":

HostLoadImpotedModule is low-level enough that it already satisfies those use cases:

This refactor reduces the number of loading-related host hooks from 2 to 1, and prevents it from growing to 4 in the future.

Constraints

This refactor should not force module loading to be asynchronous:

With this refactor both are still possible: HostLoadImportedModule can synchronously give control back to ECMA-262 (reusing the same logic they had in HostResolveImportedModule) to synchronously continue the loading process. The new module.LoadRequestedModules() will then return a resolved promise.

Hosts can still implement synchronous import of modules:

  1. Load the module.
  2. Call module.LoadRequestedModules(), which returns a loadPromise.
  3. If loadPromise.[[Status]] is rejected, throw; otherwise it's fulfilled.
  4. Call module.Link().
  5. Call module.Evaluate(), which returns a evalPromise.
  6. If evalPromise.[[Status]] is rejected, throw.
  7. If evalPromise.[[Status]] is pending, throw (it's using top-level await).
  8. Return GetModuleNamespace(module).

Hosts integration

Hosts can use these new ECMA-262 algorithms in two ways.

This is the most straigthforward integration is to keeps the loading algorithms used for HostResolveImportedModule/HostImportModuleDynamically.

Assuming that the old hooks are implemented as follows:

The new host hook would be implemented as follows:

A more advanced refactor would avoid step 3. of the above HostLoadImportedModule implementation, and fully delegate the dependencies discovery algorithm to ECMA-262. Hosts should carefully consider the differences between the ECMA-262 algorithm and their own before doing so.

Is this normative or editorial?

This proposal changes the number of spec-defined promise ticks when successfully importing a module with import("foo").

Has "foo" already been imported? Is "foo" a Cyclic Module Record? Old number of ticks New number of ticks
Yes, from the same module Yes (host-defined ≥ 1) + 1 2
Yes, from a somewhere else Yes (host-defined ≥ 1) + 1 (host-defined ≥ 0) + 2
No Yes (host-defined ≥ 1) + 1 (host-defined ≥ 0) + 2 + (Eval ≥ 0)
Yes, from the same module No (host-defined ≥ 1) + 1 2 + (Eval ≥ 0)
Yes, from a somewhere else No (host-defined ≥ 1) + 1 (host-defined ≥ 0) + 2 + (Eval ≥ 0)
No No (host-defined ≥ 1) + 1 (host-defined ≥ 0) + 2 + (Eval ≥ 0)

Open questions

Should module.LoadRequestedModules() live on Abstract Module Record or Cyclic Module Record?

ECMA-262 only has the concept of dependencies for Cyclic Module Records, but this method makes sense also for other Module Records that have dependencies not exposed to ECMA-262.

Is it possibe to use a single method that does both module.LoadRequestedModules() and module.Link()?

module.Link() uses Tarjan's algorithm to find SCCs in the modules graph and transition their elements' status from linking to linked at the same time. This algorithm tracks SCCs using a mutable stack where it pushes/pops Module Records while traversing the graph.

This approach doesn't work with module.LoadRequestedModules(), because it visits multiple paths of the graph concurrently: a mutable stack would cause race conditions in the detection of different SCCs. For this reason, module.LoadRequestedModules() transitions the modules status from new to unlinked after loading the whole graph.

  1. Is it important that module.Link() transition the status from linking to linked as soon as possible?
  2. Is there an efficient alternative to Trajan's algorithm that works with concurrent traversals?