Add communications channels to loader outputs

bmeck commented 4 years ago

Right now loaders cannot directly share data from their implementation, the global preload code, or the modules that they generate. We need to add a communications channel such as a MessagePort between all of these locations (e.g. https://github.com/nodejs/node/pull/31229#issuecomment-637939754 ).

Without these communications channels a few features are not feasible:

safely saving primordials in loader code to be used in generated modules. e.g. Saving WebAssembly.instantiateStreaming() in the global preload in order to use it inside of a module source text.
passing events requiring new load operations from application code to the loader. e.g. for hot reloading/mocking (e.g. testdouble )
having modules directly share references between each-other. e.g. having a generated module share some under underlying state with another friendly module.

I think adding a parameter to the global preload arguments is simple enough, but I do not have a clear idea on how we want to setup a channel between modules and the others. One idea is to allow putting it on import.meta if the module loader declares it somehow. Overall, it seems we need to implement this feature regardless of other designs.

jkrems commented 4 years ago

I think adding a parameter to the global preload arguments is simple enough, but I do not have a clear idea on how we want to setup a channel between modules and the others.

I was leaning towards a global preload argument as well but I don't think it actually helps much. I think the more important one is to have data modules. For inter-module communication I'd prefer if we could use modules as the communication channel.

Passing events requiring new load operations from application code to the loader. e.g. for hot reloading/mocking

I'm not sure if ping-pong over the loader is actually the best way to achieve this. E.g. it seems pretty unfortunate if resetting a mock requires sending a message to another thread that then has to identify the originating thread, send a message back, to then finally reset the mock which was "right next" to the calling code already. I think it would be much easier to have that machinery run as a module in the affected thread.

Saving WebAssembly.instantiateStreaming() in the global preload in order to use it inside of a module source text.

I assume this rules out a message port when it comes to communication between preload and modules? Because a message port wouldn't be able to forward the instance?

One idea is to allow putting it on import.meta if the module loader declares it somehow.

That's an interesting idea! I think I would want to restrict that to the preload-to-module channel. E.g. the preload could gain "exports" in the CommonJS sense which the loader may choose to expose via a new property in getSource. I guess this is the disadvantage of not making the preload itself a module.

bmeck commented 4 years ago

For inter-module communication I'd prefer if we could use modules as the communication channel.

I'm unclear on this idea.

I'm not sure if ping-pong over the loader is actually the best way to achieve this. E.g. it seems pretty unfortunate if resetting a mock requires sending a message to another thread that then has to identify the originating thread, send a message back, to then finally reset the mock which was "right next" to the calling code already. I think it would be much easier to have that machinery run as a module in the affected thread.

Communications channels being established do not mean that you round trip through the loader, e.g. setting up a simple port that pings the global preload code using the same message channel.

I agree that in general most communication should never leave the application thread.

I assume this rules out a message port when it comes to communication between preload and modules? Because a message port wouldn't be able to forward the instance?

Likely, but you also likely don't want to use an async comms channel for in-thread stuff if you need to do sync operations.

I guess this is the disadvantage of not making the preload itself a module.

If someone wants to deal with the async hooks fallout of trying to make the bootstrap async we could try to do that.

jkrems commented 4 years ago

If someone wants to deal with the async hooks fallout of trying to make the bootstrap async we could try to do that.

I'd be comfortable making it a module with lots and lots of limitations (just like it currently is a "script" with lots and lots of limitations). So I don't think it would have to be async necessarily. But likely not worth it as long as we can keep preload code as something that's relatively advanced and not necessary for most use cases. 🤞

jkrems commented 4 years ago

For inter-module communication I'd prefer if we could use modules as the communication channel.

I'm unclear on this idea.

For two modules that are both running in the same context and were generated by the same loader hooks, the channel could be an API exposed by a "normal" module. I'm not sure we need message ports there.

// generated proxy code, exposed as my-mocking-loader:proxy?file:///some/original/url.mjs:
import {getCurrentImplementation} from 'my-mocking-loader:impl-channel';
import {f as originalF} from 'file:///some/original/url.mjs';

export function f(...args) {
  return getCurrentImplementation(f, originalF)(...args);
}

// generated shared state code, exposed as my-mocking-loader:impl-channel:
export function getCurrentImplementation(fn, defaultImpl) {}

export function setCurrentImplementation(fn, impl) {}

bmeck commented 4 years ago

@jkrems I think thats a bit awkward but understand the idea. I still think they need a synchronous communications channel with the preload code and if such a thing existed you wouldn't need an intermediary module.

// bikeshed
const {WebAssemblyInstantiateStreaming} = import.meta.preloadCodeMethod(); 
const out = await WebAssemblyInstantiateStreaming(import.meta.loaderData);
// ... set exports ...

Necessitates the comms channel, and then you can always use preloadCodeMethod or w/e as bi-directional between modules.

jkrems commented 4 years ago

It might just be a matter of preference but I don't see the "intermediate" module as an intermediate module. I see it as a first-class module that implements logic. Using a module gives a clear name and identity to that logic. Especially in a world where loader hooks want to do multiple independent things, using a single namespace (the preload code and/or some object on import.meta) seems confusing.

In my example above the channel module may be implemented as a normal .mjs file, potentially even loaded directly from disk using its URL. It can be unit tested like any other code. The same cannot be said about code that depends on very specific capabilities like special import.meta properties.

But to clarify: I'm not arguing against having an optional import.meta property that allows sharing state with the preload code. I'm just very interested in making it not required to implement common loader hook use cases.

apparebit commented 4 years ago

On Transferring Objects

The first two use cases seem to be taken care of by adding the ability to transfer objects to the preload hook's interface. That way the loader can pass capabilities to the thread's initially running code, one of which could be a port to communicate back with the loader. I am assuming here that the preload hook and preload code are executed for each thread anew. That's correct, right?

As to transferring objects to modules, e.g., so that a few modules can share some state, I don't think there needs to be any extra mechanism. I see two cases here:

If you want to transfer an object from loader to module, then almost certainly that module is more privileged than the rest of the application code. That is because such objects always are specific capabilities. More privilege in terms the application means it must load before the rest of the modules already. In other words, transferring from loader to preload code is sufficient, since that code can transfer to the initial application module via shared memory or globalThis.
If you want to transfer an object from one module to another, then it seems far easier to place the shared data or code into a third module and have both modules load it. If you also want to restrict which module has access to which other modules, then there already is a perfectly suitable mechanism, i.e., the loader.

On Interposition

@jkrems, you mention that mocking code shouldn't require communication with an out-of-thread loader and that there should be some in-thread facility. I would ask a slightly different question, namely Can I interpose on an operation built into the language? If I can interpose/intercept/overload something, then I can control it, modify it, and also mock it.

So far, Proxy gives me the ability to interpose on data access and function execution within a thread. However, it doesn't cover import statements and import() special forms. That's the job of the loader. Now, as long as loaders can take over any object as it is created by a module (which I'd probably implement as a shim module with the same exact exports as the original module), then they can wrap any object in a proxy, which means you can mock your heart out.

Any realistic system needs some runtime support and that runtime support must be able to coordinate between in-thread components and loader components. I believe that adding ability to transfer objects to the preload code takes care of that.

I would be reluctant to add anything more because what I just described (1) is sufficient for fully controlling code execution but (2) not yet validated by practical experience, i.e., we wouldn't even know where the pain points are. I might add that this rapidly approaches realms territory. I don't have a good sense for why realms are stuck in standards limbo, but I suspect it is for similar reasons. We just don't have enough experience yet.

Deleting Modules Again

I propose another module loader hook to delete modules from the internal cache again.

Experience with the CommonJS module system has shown that having direct access to the module cache enable lots of experimentation and the introduction of features not originally anticipated.
But if I can't delete modules anymore, then all my ability to redirect and rewrite loaded modules is limited by the rate at which my redirecting and rewriting consumes memory vs the available total memory. Once I hit the limit, my application will crash with an out-of-memory error. Worse, I can't reliably plan for when that happens.

Java originally didn't have the ability to garbage-collect classes and it became a limiting factor very quickly. I believe a similar situation is presenting itself here. My current loader implementation does everything I want and need, even without the ability to transfer objects. But it will eventually consume all memory. @giltayar was ready to adopt my approach for his mocking library as well and will be bitten by the same limitation.

The thing is: In our use cases, we dynamically create module URLs to force reloading of modules. To that end, both of us rely on a global counter or epoch that is associated with some of the modules loaded into the application (but not necessarily all). Once the code from modules loaded during previous epochs has finished executing, it will never execute again. In other words, those modules have become pure garbage and should be collected as such. That was trivially implemented in CommonJS. It is lacking in our brave new world. That strongly suggests the addition of another hook. Now the question: Does V8 allow that?

bmeck commented 4 years ago

@apparebit please open a different issue if you want to discuss deleting modules. Currently V8 does not have any lifecycle hooks for module lifetimes and modules are increasingly aiming towards being officially unable to be GC'd due to things like how realms are looking at full specifiers which would be able to be recreated at any point during runtime. GC is only guaranteed to be done when the Realm/v8::Context a module lives is fully disposed. To my knowledge there is no communications channel that we could provide as providing the communications channel would keep the Realm alive.

nodejs / modules