tc39 / proposal-module-expressions

https://tc39.es/proposal-module-expressions/
MIT License
438 stars 18 forks source link

Cache-by-position or reinit each time #45

Open Jack-Works opened 3 years ago

Jack-Works commented 3 years ago

Possible solution:

Add a unique (prior art symbol vs unique symbol in typescript) modifier to tell which the semantics should be use.

const x = [0,1].map(() => unique module {})
// two different module. Need opt-in cause performance trap
const y = [0,1].map(() => module {}) // the same module
nicolo-ribaudo commented 3 years ago

Since modules don't capture variables in the current scope, you can always hoist the module outside of the callback if you want an unique one.

surma commented 3 years ago

The question came up in the context of the follow code sample:

someButton.onclick = () => runInWorker(module { ... });

Currently, each execution creates a new module block, resulting in a new module being added to the module map in import(). The worry here is that this could be a performance foot gun, as the module map will grow each time the event handler is executed. The module map not well-understood by developers (they don’t come into contact with it) and they might not be aware of the long-term cost of a pattern like this.

If we cached module blocks by source position, executing the same module block would return the same module block. More specifically, that would mean:

const arr = [];
for(let i = 0; i < 10; i++) 
  arr[i] = module { /* ... */ }; 

await import(arr[0]) == await import(arr[1]);

While this mitigates potential the performance foot gun, I would find this behavior to be strange and unexpected.

In contrast, the current module block behavior (no caching) is more in line with function literals or object literals behave:

runInWorker(() => {})
// vs
let h = () => {}
runInWorker(h)

Most developers know that closures (or object literals) can be expensive and need to be manually hoisted in hot-paths. The same logic would apply to module blocks:

const m = module { ... };
onclick = () => runInWorker(m);

Additionally, one of the goals of this proposal is to be a building block for future scheduler-like libraries and platform-level proposals. Let’s pretend such a scheduler exists that implements the Actor Model, where the location of the code execution is abstracted away from the developer.

If we wanted to spawn 10 actors executing the same code, caching by source position could have the unintended side-effect that multiple actors end up sharing state just because they end up on the same thread.

for(let i = 0; i < 10; i++) {
  spawnActor(module { /* ... */ });
}

I talked to @littledan as well and we are contemplating to weakly hold module blocks in the module map, which allows them to be cleaned up throughout the lifecycle of a document.

esprehn commented 3 years ago

Would it be possible to make the return value of module {} be the same object based on source position, but import() create a new weakly held module in the map each time? That way you can put a caching layer over top if you want since the module has a stable object identity (like urls today) even if each time you call import() you get a different result.

const cache = new Map();

function cachedImport(m) {
  const result = cache.get(m) || import(m);
  cache.set(m, result);
  return result;
}

for (let i = 0; i < 100; ++i) {
  const m = module {};
  assert(await cachedImport(m) === await cachedImport(m));
  assert(await import(m) !== await import(m));
}

Maybe that's too weird, but I think it's nice to not have to back derive the "url" of a module block.

jkrems commented 3 years ago

I would compare it to template literals which also has some notion of cache-by-position. I'm a bit concerned about accidental memory leaks if anything keeps previously initialized module bodies alive.

I'm concerned about innocent looking refactors like:

// Before:
async function getStuff() {
  const stuff = await import('./helper.mjs');
  return stuff.default;
}

// After:
async function getStuff() {
  const stuff = await import(module {
    // This was a one-line helper file, this is so much simpler!
    export default 42;
  });
  return stuff.default;
}

If the return value is retained, this would now keep leaking copies of that module record even though the underlying assumption of the refactor ("import imports once") was fairly reasonable today.

nicolo-ribaudo commented 3 years ago

@jkrems I don't think I understand which option you are supporting :sweat_smile:

ljharb commented 3 years ago

That example to me is a compelling argument for why it must be cached by position; module exports can have identity and that identity can be important.

jkrems commented 3 years ago

Returning a primitive was a poor choice to illustrate the problem. The problem appears only really if it would be for example a closure that closes over the module scope (e.g. accesses import.meta).

Jack-Works commented 3 years ago

Possible solution:

Add a unique (prior art symbol vs unique symbol in typescript) modifier to tell which the semantics should be use.

const x = [0,1].map(() => unique module {})
// two different module. Need opt-in cause performance trap
const y = [0,1].map(() => module {}) // the same module

Does anyone interested in my solution to this problem? Default cache-by-position and opt-in unique module each evaluation.

leobalter commented 3 years ago
const unique = module { /**/ };
const y = [0,1].map(() => unique);

I don't think there is a need to have 2 distinct ways to declare a module and keep it unique while you can store a module in a reusable name.

ljharb commented 3 years ago

@leobalter that wouldn't make it unique - ie, y would have two of the same module, instead of two distinct modules.

Jack-Works commented 3 years ago
const unique = module { /**/ };
const y = [0,1].map(() => unique);

I don't think there is a need to have 2 distinct ways to declare a module and keep it unique while you can store a module in a reusable name.

The problem here is, the default behavior currently is unique module each evaluation, which might be a footgun on memory (initialize a new module every time).

leobalter commented 3 years ago

I'm actually sleep deprived right now because of the current meeting time but let me try to rephrase how I see this:

const x = [0,1].map(() => module {/* unique */}); // new module here

const same = module { /**/ };
const y = [0,1].map(() => same); // reuse the same module

and yet it seems like user responsibility to control the code and something handled by a linter. This happens with other literal/expression values, so the potential memory footgun model already exists:

[0,1].map(() => function() { /* a new function for each entry */ });
[0,1].map(() => [ /* a new array for each entry */ ]);
[0,1].map(() => ({ /* a new object for each entry */ }));

Having a module fragment/block seems like any other literal/expression value.

ljharb commented 3 years ago

I agree if behavior is not "cache by position" but is instead "uncached", then "unique" is just "make a variable in a higher scope".

However, "cache by position" seems like a reasonable default, per above discussions.

leobalter commented 3 years ago

cache-by-position seems way more confusing, IMO.

function fn() {
  return module {
    let x = 0;
    export default function() {
      x++; // side effects
      return x;
    }
  };
}

const a = fn();
const b = fn();

a === b; // ?

// now think code here evaluating and using the modules from a and b.
ljharb commented 3 years ago

In this case, with caching, a === b would be true, and that the module has side effects is exactly why it's important that it be preserved. I think that it would be very confusing in practice if every time you called fn you created a distinct stateful module.

This is somewhat related to the peerDependencies issue in npm module graphs.

acutmore commented 3 years ago

If they were cache-by-position would they be cached similar to tagged-templates. i.e. separate cache for each containing module instance, and the module object would be frozen?

// a.js
export const moduleBlock = module {
  export const v = Math.random();
};
Object.isFrozen(moduleBlock); // true
// b.js
import * as A1 from "a.js?q=1";
import * as A2 from "a.js?q=2";

A1.moduleBlock !== A2.moduleBlock; // true
ljharb commented 3 years ago

@acutmore module namespace objects are already frozen (import * as ns) so I'd expect they'd be frozen regardless, and then yes.

acutmore commented 3 years ago

I meant the module block object, rather than the value you get when importing the module.

EDIT: right now in the draft spec the object is an Ordinary unfrozen object with ModuleBlock.prototype as its [[proto]]

ljharb commented 3 years ago

I suppose as long as the actual module stuff is in a slot, it doesn't matter much, but it'd have to be frozen if it were source-position-cached.

nayeemrmn commented 3 years ago

cache-by-position seems way more confusing, IMO.

function fn() {
  return module {
    let x = 0;
    export default function() {
      x++; // side effects
      return x;
    }
  };
}

const a = fn();
const b = fn();

a === b; // ?

// now think code here evaluating and using the modules from a and b.

That module factory function resembles something I'd write under the current reinit-on-execute semantics, after understanding how it works and utilising it. But it doesn't look like something I'd accidentally encounter under cache-by-position semantics while expecting the opposite, at far as that example illustrates at least. @jkrems's example https://github.com/tc39/proposal-js-module-blocks/issues/45#issuecomment-811255145 in favour of cache-by-position seems more compelling.

mhofman commented 3 years ago

I think the biggest problem is that with cache by position, there is no way to generate multiple module definitions from the same source (without eval). So while dynamic definition may be a hazard, as pointed out this can usually be handled by linting rules. If there was a way to "clone" module definitions, then I'd be more open to cache by position.

ljharb commented 3 years ago

@mhofman making a new module block that re-exports everything from the old one should suffice?

mhofman commented 3 years ago

That new module block needs a new position in source, that wouldn't be very dynamic?

ljharb commented 3 years ago

ah, i see what you mean, fair point.

kriskowal commented 3 years ago

I’m assuming that a module block expression produces an object. I’m not deeply familiar with the precedent established by template literals, but I would be surprised if the module object were frozen. However, if evaluating a module block every produces the same object, it really must be frozen.

I agree it would be reasonable for every evaluation of a module block to produce a unique object but to retain a cached and immutable representation of the underlying module text in an internal slot. This would allow for transport optimization. Let’s call this option “mutable module objects”.

If I were convinced that “mutable module objects” were untenable, I’d fall back to “immutable module objects”, where the every evaluation of a module block expression produces an identical but frozen object. For the compartments proposal, this would imply that new StaticModuleRecord(moduleText) should also return a frozen object. It would be strange.

nayeemrmn commented 3 years ago

I think the biggest problem is that with cache by position, there is no way to generate multiple module definitions from the same source (without eval). So while dynamic definition may be a hazard, as pointed out this can usually be handled by linting rules. If there was a way to "clone" module definitions, then I'd be more open to cache by position.

Yeah there was the separate point of which default would be less confusing, cloning is of course attainable either way.

Why not a .clone() method on module block objects? The proposed unique tag seems inflexible. Any clones have to be constructed around the declaration site, where the concern is likely more relevant to the import site. That leads to very contrived code if there is anything in between. And makes things impossible when getting module blocks from not controlled code.

Jack-Works commented 3 years ago

I think the biggest problem is that with cache by position, there is no way to generate multiple module definitions from the same source (without eval). So while dynamic definition may be a hazard, as pointed out this can usually be handled by linting rules. If there was a way to "clone" module definitions, then I'd be more open to cache by position.

I think the opt-in unique modifier is enough when you want multiple module definitions. And since it's explicitly opt-in we can avoid all those performance footguns.

leobalter commented 3 years ago

@nayeemrmn this is interesting, with some caveats.

I see precedent for a module literal expression making new modules as functions and other literals do, so there is where I hang my argument to insist in new modules with no side effects hazards when something like it is used. As @mhofman mentioned, it would be the only way to create multiple modules with the same source text without eval.

In parallel, ModuleBlock.prototype.clone() seems good but I find not enough motivation or use cases for it, even more being applied to every module. I might be wrong.

It's hard for me to understand why someone would want the same module in code that evaluates a module expression such as:

const x = [0,1].map(() => module {/* unique */});

Maybe I understand a function encapsulating some module code without the need to observing anything external such as:

// some bundled code
async function getData(key) {
  const m = module { /* data code in here */ }

  const data = await import(m);

  return data[key];
}

The code above would highly benefit of caching-by-position but there are alternatives to overcome that.

I need to think more about this but I wonder what this would mean for bundling.

ljharb commented 3 years ago

I guess another way to look at it is, class and function already have this hazard, and it's Fine™, so maybe defaulting to no caching is also Fine™?

leobalter commented 3 years ago

@ljharb this is how I see it, but maybe caching-by-position resolves the concern from @domenic on reusing modules with Workers.

nicolo-ribaudo commented 3 years ago

I don't think it solves @domenic's concern, because you can still easily span a ton of workers (even if they share the same module object).

acutmore commented 3 years ago

I think the biggest problem is that with cache by position, there is no way to generate multiple module definitions from the same source (without eval).

It would still be possible without direct use of eval as I showed in https://github.com/tc39/proposal-js-module-blocks/issues/45#issuecomment-951147780 but it does re-evaluates the entire containing module again. Not exactly ergonomic, also possible module cache leak.

Do we have any use cases for creating multiple identical module blocks? In the situations I’m imagining it would be more useful to control the freshness at the site where I’m doing the dynamic import.

import(mb, { fresh: true }) // skip cache

Jack-Works commented 3 years ago

I guess another way to look at it is, class and function already have this hazard, and it's Fine™, so maybe defaulting to no caching is also Fine™?

But classes and functions can be GC when they're not being referenced. The module is different...?

Maybe an anonymous module with no reference can be GC too?

nicolo-ribaudo commented 3 years ago

A module can be GCed when it's not referenced, but only if it's not cached by position.

function getModule() {
  return module {};
}

const ws = new WeakSet();
let x = getModule();

ws.add(x);
x = null;

// after some time...

console.log(ws.has(getModule()));

If it's cached by position, the last call must return true and thus the module cannot be GCed.

ljharb commented 3 years ago

it could be GC'd when getModule is GC'd.

surma commented 3 years ago

Let me summarize my current position:

Overall, I am strongly leaning towards not caching by source position.

nicolo-ribaudo commented 3 years ago

Another point in favor of not caching:

function getModules(len) {
  let modules = [];
  for (let i = 0; i < len; i++) {
    modules.push(module {});
  }
  return modules;
}

// vs

function getModules(len) {
  let modules = [];
  let mod = module {};
  for (let i = 0; i < len; i++) {
    modules.push(mod);
  }
  return modules;
}

// vs

let mod = module {};
function getModules(len) {
  let modules = [];
  for (let i = 0; i < len; i++) {
    modules.push(mod);
  }
  return modules;
}
kriskowal commented 3 years ago

There is a fundamental reason that module blocks cannot be cached at all, assuming we eventually have a module loader API and assuming we can communicate with that module loader API in a separate process. For the sake of argument, I assume the Compartment proposal is the module loader API.

Being able to dynamically import a module block implies that the module loader API would support compartment.import(moduleBlock). I am assuming that moduleBlock corresponds in this case to a StaticModuleRecord and that its representation on the wire is just the text of the module and the type tag for a static module record.

This assumption is grounded in the XS Compartment implementation which allows a parent Compartment to communicate static module records to a child Compartment with different module specifiers.

new Compartment(null, {
  "your.js": "my.js"
})

This is important because it allows XS to precompile all of the modules in the program’s working set and reveal them to child compartments where they may be linked according to their import maps relative to the referrer specifiers they were given. XS Precompilation is analogous to bundling, so I expect similar patterns will be relevant in contexts beyond embedded systems.

And, import maps also lend this kind of flexibility. When a static module record transits from one worker to another, and the workers have different import maps, the import specifiers may link to different modules in the receiving module graph than they would have in the sender’s.

So, with invocations like import(module), compartment.import(module), and worker.import(module), we can assume that module is a static module record and that it may have been created from a module block.

There is a problem with this idea though. Since a static module record doesn’t carry the referrer module specifier, there is no vessel to carry import.meta.url or to resolve import specifiers in the module. The proposal as written states that the referrer is inherited, which means it would have to be carried by whatever “module” is, and that couldn’t just be a static module record. There would need to be an envelope. That envelope could carry url, but not a closure like resolve: import.meta is not necessarily communicable between workers.

And, if “module”, is an envelope with a referrer module specifier and a static module record, it would be enticing to assume that the referrer module specifier is a cache key. To be a suitable cache key, it would have to include a content address hash of the text of the block, or transmission of a module block would have to include the whole text of the surrounding module and the referrer would need to contain the position. Thinking in this direction is problematic because there would be an integrity issue: the sender could override the module that would have been loaded by the recipient for the given referrer module specifier.

In short, I’m strongly in favor of not caching module blocks. I’m also in support of module blocks corresponding to static module records. I’m also in support of the receiving compartment governing the referrer module specifier for the block, but that is irreconcilable with my desire for module blocks to correspond to static module records. The only course out of this dilemma that I can see is for the module block to produce an envelope with the static module record and the referrer module specifier, for the initialization of the block to be ephemeral and not have a cache key, but use the referrer module specifier only for the purpose of resolving import specifiers of the module block, using the resolver math or importmap of the receiver.

Jack-Works commented 3 years ago

I talked to @littledan as well and we are contemplating to weakly hold module blocks in the module map, which allows them to be cleaned up throughout the lifecycle of a document.

If there is no way to refer to the module, it actually doesn't observable so the engine can GC them. (This requires module blocks to not have a unique identifier to reference, e.g. via import.meta.url)

I’m not deeply familiar with the precedent established by template literals

For anyone don't know that behavior (including me), the following code is an example of cache-by-position. (Though I don't know why it is designed like this).

function id(x) {
  return x;
}
function templateObjectFactory() {
  return id`hello world`;
}
let result = templateObjectFactory() === templateObjectFactory();
// true

Overall, I am strongly leaning towards not caching by source position.

Yes, it seems like the current argument can convince me that the current behavior is good enough. (It will be better if anymous module block instances can be GCed). If there is no anyone think this is a problem, I'll close this issue.

jridgewell commented 3 years ago

I'm not sure why the performance footgun should be the default behavior? If mod { } creates a new module instance every time, how expensive is that going to be in a loop? Especially if we want to encourage mod () => {} usage, where we should be viewing these as throw aways items.

These should absolute be cache-by-location. If you want the slow-do-all-module-evaluation-again way, mark it with a clone call.

ljharb commented 3 years ago

Creating anything in a loop is an antipattern; I'm not sure we need to worry about that.

jridgewell commented 3 years ago

One of the use cases is to add an event listener that sends off to a worker thread to compute. A keyDown listener isn't a loop, but it's certainly expensive. Most of the use cases I've seen in module-function thread are all throwaway values that would be called multiple times.

ljharb commented 3 years ago

The way you'd solve that with addEventListener is the same - you'd hoist the function outside the scope so it could be reused.

jridgewell commented 3 years ago

The entire point of the mod-function thread is that the normal syntax is too heavy for inline use. It's core use is to not hoist.

surma commented 3 years ago

If mod { } creates a new module instance every time

I’m pointing out a nit, because I wonder if some other folks might get this distinction wrong: module { ... } creates a new module block, which, in it’s currently spec’d format, is not much more than a glorified, parsed string of JS code. I don’t think it is expensive enough to qualify as a footgun.

import(moduleBlock) creates an instance of the module, which is arguable more expensive.

Jack-Works commented 3 years ago

when you have a module block, you always want to instantiate it to make it useful, no matter using import call, worker.addModule or any other API, so these two things I think are identical

jridgewell commented 3 years ago

when you have a module block, you always want to instantiate it to make it useful,…, so these two things I think are identical

This is my impression as well. It doesn't matter if module { … } is fast to make an object, if it's slow to actually instantiate and make it usable, then it slow overall.


To illustrate the way I see this, imagine the following code:

function make() {
  return module { … }
}

const a = make();
const b = make();

It doesn't really matter to me if a === b. But what does matter is that import(a) === import(b), so that reusing a module location can be fast regardless of where it's used (in particular module () => {} being written inline). I also think that sending a and b to a web worker should maintain this, so that import(a) === import(b) on the worker side of the boundary (obviously the worker will have to create its own instantiation of the module, but the module should be cached).

So given import(a) === import(b), should a === b. I think so, because it make it more obvious that importing them would give the same value. But if we to treat these as unique object referencing the same underlying module instance, then that wouldn't be terrible.

ljharb commented 3 years ago

You're describing the same problem if you create a new object or function inside make - and the solution is the same. Move the declaration outside the function, or use a memoization abstraction.

esprehn commented 3 years ago

There's a big difference between a module block and other things folks might use in a loop like a function. Function instances have no state and are very cheap to create, the actual function code is only created once. That's not the same thing as running all the global initialization within a module when you do import(). The other big difference is that you don't transform a class or function into another thing to use it, while a module does get transformed by way of import().

I think the most important thing here is that you can create a cache within libraries by having some kind of stable identifier for the thing returned by the module block. So like:

fn = () => module {}
fn() !== fn() // this is fine
await import(fn()) !== await import (fn()) // also fine

but somehow libraries need a way to cache passing in the exact same module text so they can make

runTask(module { ... })

not resolve the module state repeatedly.

Similarly if I can postMessage a module to another thread for evaluation, the other side needs a way to avoid doing import() by way of some kine of caching. That can't be fixed by loop hoisting, since the ModuleBlock is cloned when it gets structured cloned.

mhofman commented 3 years ago

Function instances have no state and are very cheap to create, the actual function code is only created once. That's not the same thing as running all the global initialization within a module when you do import().

I'd like to understand this assumption.

Declaring a closures in loops capture a lot more state (parent contexts) than a module by itself would. And importing a module declaration doesn't have to be particularly more expansive than calling a function, especially if there are no dependencies like there would be for module function. The code has already been parsed, so it's only the linking that is expensive. And the parsed code can be cached by position independently, as that is an unobservable implementation detail.

Similarly if I can postMessage a module to another thread for evaluation, the other side needs a way to avoid doing import() by way of some kine of caching. That can't be fixed by loop hoisting, since the ModuleBlock is cloned when it gets structured cloned.

I believe this is a separate topic discussed in #58. The question is whether a single module declaration results in a single module namespace object when imported, regardless of the identity of the "pointer" to the module declaration, or how it got to the importing realm. This is particularly a problem for realms in different agents.