tc39 / proposal-compartments

Compartmentalization of host behavior hooks for JS
MIT License
117 stars 10 forks source link

module hooks and apis #18

Open devsnek opened 4 years ago

devsnek commented 4 years ago

So right now the spec defines modules to hosts sort of like this:

  1. Modules are concrete things with their own state and methods and whatnot
  2. Modules cache successful imports by specifier
  3. Hosts are responsible for turning (referringModule, specifier) into another module (you could also say they only have to do that once, due to point 2)

Compartments add behaviour on top of this:

  1. You can supply a moduleMap to give a compartment which is not keyed by (referringModule, specifier), e.g. the api has stronger requirements than the spec
  2. moduleMap seems leaky because you have to add importNow() and module() on top of it.

From my perspective, a compartment would either defer HostResolveImportedModule upward to the host, or use the HostResolveImportedModule that someone passes to the constructor. Any abstractions on top of that, such as a moduleMap or getting items directly out of such a map to execute or whatever synchronously, would not be the problem of compartments.

So basically what I am suggesting is this:

What I believe you end up with there is the same capabilities as the current proposal, but without compartments containing these patterns and apis that the spec doesn't need.

This API has more complexity, but from seeing people use node's vm api (which is similarish to the above) I don't believe that will be a problem.

bmeck commented 4 years ago

A few notes:

I am increasingly convinced we should not directly expose a constructor for creating modules in the current Compartment, but allowing for modules to be resolved by the hook of the current compartment seems fine.

devsnek commented 4 years ago

@bmeck

For places like IOT they do not have any parser so we at minimum need to avoid requiring bringing the source code in a JS API. Also, they cannot return SourceTextModule instances since things like Moddable's XS engine can instantiate new SourceTextModule instances but share the same underlying ECMAScriptSource.

Right the actual api itself doesn't matter so much, we're free to iterate on that, the point is just that you can represent and directly link/evaluate an individual module, since some platforms apparently need to do that (importNow).

I'm not clear on how removing all of those utilities makes things easier, in fact it seems to make things more difficult as you must code your own and the runtime won't be able to do things ahead of time. Having implemented my own Module job workflow I am not keen to state that it is simple enough to do that it is a good thing to have as our recommended approach. Alternative utilities to simplify things seem fine if they are discussed.

I think it complicates the api a little bit, i said so above :). The point is more about providing javascript's capabilities instead of providing a "common" usage of javascript's capabilities, which I think is better left to libraries.

I do not think removing all features to only have a barely usable interface only guided by the spec but not enforced is our goal. The goal of these is to keep the spec aligned with both realities of runtimes and the current invariants.

I still want the interface to be usable, I just don't like the somewhat arbitrary behaviour and apis on top of what is otherwise a fresh ecmascript environment.

Jack-Works commented 4 years ago

For places like IOT they do not have any parser so we at minimum need to avoid requiring bringing the source code in a JS API

Disabling source code evaluation will limit some useful usages. (See #11 & #12). You can limit or ban it in IoT runtime but please leave it for other platforms.

devsnek commented 3 years ago

ping @kriskowal

kriskowal commented 3 years ago

I’m very much in favor of keeping moduleMap and resolveHook. They are certainly necessary for the IoT case and helpful for isolating packages in compartments. I find that reasoning about compartments as packages is cleaner than reasoning about Node.js modules in a shared filesystem layout with the hazards of accidental cross-package linkage.

The Compartment API certainly does not preclude the latter though! A resolveHook can pass the module specifier unaltered, and the moduleMap can be left empty. I expect that the moduleMap remains useful anyway for threading bare specifiers to built-in modules.

@jack-works Supporting the IoT case (which bootstraps without a loader) does not preclude dynamic module loading from source texts. The compartment API can easily accommodate both and they appear to work well together based on my work on the shim. They work especially well together when you build a module loader that scopes every package to a compartment and enforce access policies at the package granularity.

I am proposing the removal of importNow for unrelated reasons.

devsnek commented 3 years ago

@kriskowal Can you elaborate on moduleMap being needed? An implementation of HostResolveImportedModule can still use a flat object, it just wouldn't be built into the compartment api as is currently proposed.

kriskowal commented 3 years ago

Before running away with a vague idea of what you’re proposing (perhaps already too late!), I have clarifying questions.

So basically what I am suggesting is this:

  • Remove all the derived module logic (importNow, importNowHook, moduleMap, etc).
  • Restructure module instances to be more powerful (new SourceTextModule(source).{link,evaluate}() or similar, bikeshedding probably needed there)

Can you make this suggestion more concrete? This feels similar to the proposal I presented for a Module(source, location)(imports) => exports constructor in 2010, that would have left the entire concern of resolution, loading, and linking out of 262. That was coherent with the system I was proposing. I’m sympathetic to the design intuition, at least! This is also similar to the design in @coder-mike’s Microvium.

Since 2010, resolution, loading, and linking have become more specific and it would seem useful for those details to be provided in-place in the cases they’re well-aligned across all engines, if not necessary to preserve invariants of the specification.

This direction would need to answer some questions:

/** {(Compartment, Map<ImportSpecifier, ModuleInstance>) => ModuleInstance & { imports: Array<ImportSpecifier>, exports: Array<string>, reexports: Array<string>, reexportsAll: bool }} */
const moduleRecord = new StaticModuleRecord(source, location); // location only for parse errors
const importMap = new Map([[importSpecifier, moduleInstance]]);
const moduleInstance = moduleRecord(compartment, importMap);
module.exports // exotic module exports namespace object
await module.initialize();

The expectation would have to be that the module instance map passed to the module instance constructor function could be initially empty but must contain an entry for every import specifier before calling initialize.

We would still have the concern of whether to implement both initialize and initializeNow to handle the transitive top-level-await one or two ways.

This API has more complexity, but from seeing people use node's vm api (which is similarish to the above) I don't believe that will be a problem.

Upon a closer read, you are not suggesting a unified approach to resolution. You are suggesting that the concern be fully externalized.

I won’t pass judgement on whether this is more simple or complicated. There is a certain inherent complexity to the problem and this just moves that complexity from inside compartment to outside. One thing I appreciate about the compartment proposal as written today is that it captures that complexity. Whether that is good or not really depends on whether it’s expected to vary from one use to another. My intuition is that there isn’t a wide variety of ways to implement the Compartment load or import methods, so capturing them has a high utility in reducing the amount of code that’s left to user code.

In any case, I’m happy to indulge the straw man as a thought experiment.

kriskowal commented 3 years ago

If I’ve correctly inferred your intention, I would like to retitle this issue “Externalize load and initialize implementations”. The implication is the removal of load, import, importNow, resolveHook, moduleMap loadHook / importHook, and importNowHook, and that compartment become a handle that gets passed when instantiating a module. The load and import code is then relegated to user code, which we would be obliged to furnish in our usage examples. We would need to add a ModuleInstance type with an initialize() method.

devsnek commented 3 years ago

From a high level, my intention is to unify, where possible, the spec's hookable behaviour and the hooks we expose here. This is to ensure that a compartment environment can always match something that could be represented using the power of the actual spec.

The implication is the removal of load, import, importNow, resolveHook, moduleMap loadHook / importHook, and importNowHook

I think it makes sense to keep import(), with more or less the behaviour of the spec's ImportCall expression.

Can you make this suggestion more concrete?

There's still a lot of bikeshedding to be done, but I imagine something like this:

class Compartment {
  constructor({ hooks: { async resolveImportedModule() {}, finalizeImportMeta() {}, promiseRejectionTracker() {}, ... } }) {}

  createScript(sourceText) { return a Script }

  createSourceTextModule(sourceText) { return a SourceTextModule }

  import(specifier, referrer: SourceTextModule|Script = null) { return equiv of import() in referrer }
}

class SourceTextModule {
  link() // async, interleaves with resolveImportedModule

  evaluate() // async, interleaves with tla evaluation

  getNamespace()
}
kriskowal commented 3 years ago

I clearly misread your intent. Hah!

I believe I can see where you’re coming from. This would certainly make Compartment more closely resemble Node.js’s vm module.

The bike shedding is not important to the effectiveness of the strawman. Thanks for the sample.

I’ll take some time to answer for myself how this attempts to solve the same problems I know compartments solve and let you know of any holes.

If you have time to make this even more concrete, it would help. As written, I’ll guess at the signature and semantics of the hooks, how to implement simple evaluate from these primitives, and what in this corresponds to a static record (which does not capture state and can be reused between compartments) and a module instance (that does capture state).

If you have answers to the questions from my prior message, that would help too.

coder-mike commented 3 years ago

For places like IOT they do not have any parser so we at minimum need to avoid requiring bringing the source code in a JS API

Minor pedantic note, but not all IoT environments lack a parser, and even when they do, the lack of a parser on the target device does not necessarily preclude a module API from accepting source text directly, as I think the Microvium API demonstrates. However, for XS specifically (and maybe in other cases), I believe the parser is optional and that it's favorable for the Compartment API not to require that the parser is included.

There's still a lot of bikeshedding to be done, but I imagine something like this

@devsnek I like the general idea. I'm also curious to see a more concrete example (or just more detail) of what you're suggesting.

I'm also curious about the edge/error cases. For example, what happens if I call evaluate on a module before calling link? Or if I call link on a module, and while it's linking I also call evaluate (e.g. in the resolveImportedModule hook), what does this do?

What must resolveImportedModule return? Can I resolve an import to something that is not created from source text, such as a host-implemented module, a wasm module, JSON module, or a programmatically-generated module namespace (e.g. an attenuated wrapper around an existing module)?

I like the fact that there is only one type of specifier mentioned (the import specifier), and it's the type of specifier that the user must already be familiar with since it appears in the source text. This reduces the number of concepts the user must be familiar with to use the API, and reduces the number of new definitions we would need to introduce to the ES spec.

Jamesernator commented 3 years ago

I'm also curious about the edge/error cases. For example, what happens if I call evaluate on a module before calling link? Or if I call link on a module, and while it's linking I also call evaluate (e.g. in the resolveImportedModule hook), what does this do?

In the spec, we would want to wrap this to ensure that it fails. Hosts are expected to not do this already, but because there's arbitrary user code we'd need to wrap it with guards.

What must resolveImportedModule return? Can I resolve an import to something that is not created from source text, such as a host-implemented module, a wasm module, JSON module, or a programmatically-generated module namespace (e.g. an attenuated wrapper around an existing module)?

Node has both SourceTextModule and SyntheticModule, I imagine we would want to do the same with compartments regardless of what shape this API takes. And in fact JSON modules spec actually adds Synthetic Module record, so we would probably just wrap that.

Minor pedantic note, but not all IoT environments lack a parser, and even when they do, the lack of a parser on the target device does not necessarily preclude a module API from accepting source text directly, as I think the Microvium API demonstrates. However, for XS specifically (and maybe in other cases), I believe the parser is optional and that it's favorable for the Compartment API not to require that the parser is included.

This is a more complicated point, but as mentioned above, if importHook is not specified (or if it returns undefined?) then calling compartment.import("some-specifier") will simply fall back to host resolution. From here you can simply use pre-created modules.

The moduleMap thing seems a lot more similar to what the builtin modules proposal is trying to do. I think it would probably be better to use that architecture of that proposal for handling modules that need to be available synchronously rather than having some compartment-specific moduleMap thing.

I'm also curious to see a more concrete example (or just more detail) of what you're suggesting.

I can't speak for @devsnek, but I imagine it would be very similar in usage to Node's --experimental-vm-modules API.

With the slightly different API shape to match ECMA262 more closely, you could implement browser modules like this (if <script type="module">/import() didn't exist):

function isBareSpecifier(specifier) {
    return !isURL(specifier)
        && !specifier.startsWith("./")
        && !specifier.startsWith("../")
        && !specifier.startsWith("/");
} 

const modulesByURL = new Map();
const urlsByModule = new WeakMap();

async function resolveHook(specifier, parentModule=null) {
    if (isBareSpecifier(specifier)) {
      // import map logic would go here, https://github.com/WICG/import-maps
      throw new Error("import maps not supported");
    }
    const moduleURL = parentModule === null
        ? new URL(specifier).href // Normalize it
        : new URL(specifier, urlsByModule.get(parentModule)).href;
    assertIsURL(moduleURL);
    if (modulesByURL.has(moduleURL)) {
        return modulesByURL.get(moduleURL);
    }
    const response = await fetch(moduleURL);
    if (!response.ok) {
        throw new Error("Failed to load module");
    }
    const contentType = response.headers.get("content-type");
    if (contentType === "text/javascript") {
        const textContent = await response.text();
        const module = compartment.createModule(sourceText);
        modulesByURL.set(moduleURL, module);
        urlsByModule.set(module, moduleURL);
        return module;
    } else if (contentType === "application/json") {
        const textContent = await response.text();
        const syntheticModule = compartment.createSyntheticModule(["default"], () => {
            syntheticModule.setExport("default", JSON.parse(textContent));
        });
        modulesByURL.set(moduleURL, syntheticModule);
        urlsByModule.set(syntheticModule, moduleURL);
        return syntheticModule;
    } else {
        throw new TypeError(`${ contentType } not supported module type`);
    }
}

const compartment = new Compartment({
    resolveHook,
});

const module = await compartment.import("https://url.tld/my-module.js"); 

Admittedly some things are a bit awkward, in particular the fact we need the compartment in scope to actually write the resolveHook. Additionally for something like WASM modules it would be really nice to have the cyclic module machinery available, otherwise it's a pain to write.

e.g. Maybe an API surface like this would be better:


class Module {
    get namespace(): ModuleNamespace;
    link(): Promise<void>;
    evaluate(): Promise<void>;
}

type CyclicModuleStatus
    = "unlinked" | "linking" | "linked" | "evaluating" | "evaluated";

class CyclicModule {
    get requestedModules(): string[];
    get status(): CyclicModuleStatus; // Might be useful, but might not be neccessary
}

class SyntheticCyclicModule extends CyclicModule {
    setExport(exportName: string, value: any): void;
}

class SourceTextModule extends CyclicModule {}

class SyntheticModule extends Module {
    setExport(exportName: string, value: any): void;
}

type CompartmentOptions = {
    resolveHook?: (specifier: string, parentModule: CyclicModule | null, compartment: Compartment) => Module,
    // ...and others
}

class Script {
    evaluate(): any;
}

class Compartment {
    constructor(options: CompartmentOptions);

    createScript(text): Script;

    createSyntheticCyclicModule(
        requestedModules: string[],
        initializeEnvironment: (
            module: SyntheticCyclicModule, 
            modules: Record<string, Module>, /* the modules returned from resolveHook for the modules requested */
            compartment: Compartment,
        ) => SyntheticCyclicModule,
        evaluateModule: (
            module: SyntheticCyclicModule,
            compartment: Compartment,
        ) => any,
    ): SyntheticCyclicModule;

    createSourceTextModule(sourceText: string): SourceTextModule;

    // This might be superfluous, we could just use SyntheticCyclicModule
    // with an empty initializeEnvironment function and no requestedModules
    createSyntheticModule(evaluateModule: (module: SyntheticModule, compartment: Compartment) => any): SyntheticModule;

}

Obviously there's a number of smaller bikesheds even on this, maybe we would want a more Node-like design where instead of having .create methods on the compartment we passed it to the constructors (e.g. new SourceTextModule(sourceText, { compartment })), or maybe we want Module objects to be things we can clone to multiple compartments and evaluate them in each without recreating the modules per compartment, etc, still quite a number of these sort've things to decide.

Jamesernator commented 3 years ago

Oh and if you're wanting something to play with, I'd recommend trying out either Node's --experimental-vm-modules API, alternatively I wrote an entire ES module shim that has an API very similar to what's described above (albeit without reference to a specific compartment, instead resolveModule is passed to all modules).

Jamesernator commented 2 years ago

For places like IOT they do not have any parser so we at minimum need to avoid requiring bringing the source code in a JS API

With the module blocks proposal at stage 2, surely those objects would be sufficient for use cases that can't dynamically use the parser? i.e. In such IOT environments a module block would refer to a pre-compiled chunk of code, then compartment.import(someModuleBlock) to load that particular block. And similarly resolveHook() being allowed to return a module block.

kriskowal commented 2 years ago

Please see “Reify Module Instance” #51 for my most recent earnest attempt to capture what this alternate design would look like based on the current draft of compartments.

Jamesernator commented 2 years ago

So the latest revision of the proposal contains reified module instances similar to as proposed in the OP (or as in Node), with some API shape differences.

Probably the most notable difference with Node's experimental vm.Module is there is no direct access to .link(), rather import(module) performs both linking and evaluation in the background.

Do we think this satisfactorily resolves this issue, or is there more to desired from the latest API shape?