tc39 / proposal-source-phase-imports

Proposal to enable importing modules at the source phase
https://tc39.es/proposal-source-phase-imports/
MIT License
129 stars 9 forks source link

Implementation feedback: what's the deal with enforcing subclassing via [[ModuleSourceClassName]]? #59

Closed syg closed 5 months ago

syg commented 5 months ago

What's the value of enforcing %AbstractModuleSource% class hierarchy via [[ModuleSourceClassName]] if everything else is actually generic? This seems to be a hairshirt with no benefit unless I'm missing something.

syg commented 5 months ago

cc @guybedford

guybedford commented 5 months ago

This came out of feedback at plenary that the object returned for the source phase should be of an expected type and also branded.

Are you questioning whether the abstract prototype is needed if we just maintain branding?

In terms of prototype benefits, for example, we are currently evaluating in the ESM source phase imports proposal whether to implement imports() and exports() on this object or not as well so depending on how that discussion goes WebAssembly may possibly inherit features from this base.

syg commented 5 months ago

Are you questioning whether the abstract prototype is needed if we just maintain branding?

The opposite, whether we need to maintain branding. Concretely, a user-exposed brand check here for a slot constrains implementations in that synthetic modules that the host might want to add need to now subclass instead of "just" behaving the same. And since there's no actual useful state AFAICT that's subclassed (just a string description), I'm trying to understand why.

What was the rationale from plenary feedback? My notes searching is failing me.

syg commented 5 months ago

Abstractly related, we've also moved to disliking subclassing of builtins, which this is requiring hosts to do.

guybedford commented 5 months ago

It seems you weren't present in the meeting this was discussed at https://github.com/tc39/notes/blob/main/meetings/2023-03/mar-22.md#import-reflection-update.

This work was done based on plenary feedback, see the notes, and the third comment by JHD.

If you are asking for an alternative here, what is it?

guybedford commented 5 months ago

Note that in our current proposal, synthetic modules and other module types without a source phase would throw if they do not have a source phase representation.

We could define the source phase for these other module types, but the intent of the concern I understood was that TC39 should be able to describe properties that this object should have instead of just letting hosts do whatever they feel like.

syg commented 5 months ago

If you are asking for an alternative here, what is it?

Concretely, I'd like the following:

We could define the source phase for these other module types, but the intent of the concern I understood was that TC39 should be able to describe properties that this object should have instead of just letting hosts do whatever they feel like.

That's fine. We put constraints on what behaviors host hooks can have already.

In the current spec the difference between a host hook and requiring an actual subclass is observably equivalent, I think. However, I don't want us to spec ourselves into a corner in the future that requires embedder implementations like blink to implement these as actual subclasses.

@ljharb thoughts on the above alternative to requiring actual subclasses?

guybedford commented 5 months ago

Do not require real subclasses with an observable brand

Are you referring to the internal slot itself here? That was a requirement stated by @ljharb that there be an internal slot, that toStringTag can be used with to provide strong branding.

I don't think we can remove a feature at stage 3 that another delegate sought as a requirement for stage 3?

As far as I can tell that is required to remove the proper subclassing as well? Or is there a simple way to do strong branding without proper subclassing?

It would have been preferable if you had brought this up earlier, I'm unsure if there is time at this point to integrate this feedback, but let's work through the discussion.

syg commented 5 months ago

Are you referring to the internal slot itself here? That was a requirement stated by @ljharb that there be an internal slot, that toStringTag can be used with to provide strong branding.

I am.

I don't think we can remove a feature at stage 3 that another delegate sought as a requirement for stage 3?

Indeed we shouldn't unilaterally remove it. I'm offering stage 3 implementation feedback that the internal slot puts constraints on the implementation that I find undesirable. We'd need to get consensus on a normative change to change the behavior here.

As far as I can tell that is required to remove the proper subclassing as well? Or is there a simple way to do strong branding without proper subclassing?

Not that I know of, the two are equivalent I think.

It would have been preferable if you had brought this up earlier, I'm unsure if there is time at this point to integrate this feedback, but let's work through the discussion.

Well I do too! But that this would be burdensome was unclear to me before I started looking at the implementation more in depth.

guybedford commented 5 months ago

@syg my concern is that if you are arguing against strong branding, that may place us in deadlock with @ljharb's requirement. Can you explain a more clear reason for why you feel this is undesirable? What we do is very similar to how TypedArray behaves, so why do you feel it is okay there and not okay here?

ljharb commented 5 months ago

I don't care about toStringTag specifically, but I definitely insist on anything that has special, hidden behavior to be identifiable with a brand check, just like every other builtin in the language.

This can be implemented via a toStringTag accessor, or a static "isFoo" predicate, or any number of mechanisms, but it's necessary for any builtin that a robust, cross-realm method of identifying it exists.

syg commented 5 months ago

Can you explain a more clear reason for why you feel this is undesirable?

The module system is a coordination point between JS and the the embedder. As such, it is natural for the embedder to provide its own kinds of modules in addition to what 262 provides, like CSS modules, Wasm modules, etc. It is a good goal that the first class representation of all these modules have a common interface and a common set of behaviors.

One way to meet that goal is to require that the embedder's own modules must be literal subclasses by having a set of internal slots, and add a user-exposed predicate to check for those internal slots. I find this undesirable because it constrains and couples the JS VM and the embedder.

To be concrete, say I want to provide a CSSModuleSource or something in blink. V8 knows about the 262 slots, not the slots of web API objects. blink knows about the slots of web API objects, not the slots of the 262 objects. Most web engines, not just blink and V8, have different object representations for JS objects and web objects that run deep. Web engine objects tend to be C++ layout objects, JS engine objects tend to be custom GC-managed things. If we spec an observable internal slot via a predicate, I have to implement that behavior by bridging the two different object representations. The usual way to do the above is to expose hooks instead of putting constraints on the representation, both in concrete implementations and in the spec. The host layering is done via hooks, not providing objects that the host is expected to subclass.

I could implement this as a hook in the concrete implementation and have it be observably the same. But this has some correctness risk down the road. With host hooks, it is clear that here is a point where host can introduce their own things (e.g. modules). But every new method we add that makes use of internal slots obscures that boundary, and I'd have to work out if the existing implementation is still observably equivalent.

Another way to meet the goal is to have host hooks and put constraints on those hooks. I have nothing against a hook that's like "do you behave like a module source?" or constrained like "a host that provides an implementation of hook X must also provide an implementation of Y such that both implementations conform to the following requirements [...]".

What we do is very similar to how TypedArray behaves, so why do you feel it is okay there and not okay here?

What is the TypedArray analogy? Embedders are also not subclassing TypedArrays...

but I definitely insist on anything that has special, hidden behavior to be identifiable with a brand check, just like every other builtin in the language.

@ljharb What's the hidden behavior in this case? And we definitely don't have brand checks for every builtin, but I suppose you mean the subset of builtins with hidden behavior?

guybedford commented 5 months ago

What is the use case for CSSModuleSource, given that ConstructibleStyleSheet already behaves like a source in being a factory and that CSS modules have no dependencies or exports otherwise?

The TypedArray similarity is that in the ECMA262 spec TypedArrays have a [[TypedArrayName]] internal slot that is unique to the subclass, which is quite similar to what we are doing for toStringTag on modules. Why does this not have the same implementation difficulties you describe? Is it not a strong internal slot "brand" then?

I struggle to understand how hooks would solve this situation - the requirement stated was to have a function that can uniquely identify module sources, and that the sources provided to that function cannot be forged. Given an arbitrary JS object, are you suggesting we could define that check through a host hook instead to avoid the internal slot?

If we were to proceed in that way, while retaining the prototype chain on both the instance and the static constructor, would that that resolve your implementation feedback concern?

@ljharb What's the hidden behavior in this case? And we definitely don't have brand checks for every builtin, but I suppose you mean the subset of builtins with hidden behavior?

The hidden behaviour is that these objects have special meaning and may become first-class importable via dynamic import() under esm source phase.

guybedford commented 5 months ago

Note we are also looking to include additional information on the internal slots in the esm source phase proposal, which would retroactively apply to WebAssembly, including a [[Module]] slot reference.

syg commented 5 months ago

What is the use case for CSSModuleSource, given that ConstructibleStyleSheet already behaves like a source in being a factory and that CSS modules have no dependencies or exports otherwise?

That was an arbitrary example. Assume it's just FooSource.

The TypedArray similarity is that in the ECMA262 spec TypedArrays have a [[TypedArrayName]] internal slot that is unique to the subclass, which is quite similar to what we are doing for toStringTag on modules. Why does this not have the same implementation difficulties you describe? Is it not a strong internal slot "brand" then?

The difference is that all the TypedArray subclasses are in 262 (and V8 / SpiderMonkey / JSC), not an embedder (Blink / Gecko / WebKit).

For Wasm this is fine, the JS engines implement the Wasm stuff themselves. For future extensions this worries me.

I struggle to understand how hooks would solve this situation - the requirement stated was to have a function that can uniquely identify module sources, and that the sources provided to that function cannot be forged. Given an arbitrary JS object, are you suggesting we could define that check through a host hook instead to avoid the internal slot?

The difference is mainly that internal slots hides real implementation friction between in web engine and JS engine boundary and it worries me.

If we were to proceed in that way, while retaining the prototype chain on both the instance and the static constructor, would that that resolve your implementation feedback concern?

I am suggesting that since the host is expected to be able to provide their own module sources, we use the existing host hook machinery instead of layering the specs by requiring host specs to subclass. That would resolve my concern, yes.

guybedford commented 5 months ago

Thanks for clarifying, that makes a lot of sense if we want to simplify the host integration and allow platforms to define their own source types, and this is exactly in line with Stage 3 implementation feedback.

In this case, what about something like the following:

Then with ESM Source Phase, if we define dynamic importability, that would be an additional host hook to obtain the concrete module record, instead of being an internal slot. So we'd likely end up with these two main hooks for the source phase.

If that can work for both yourself and @ljharb I can try to get things together on this, please let me know your thoughts. My concern from a progression point of view is to ensure that we are able to somehow communicate through committee any changes have agreement. Moving to Stage 4 does not seem suitable without having the ESM integration finalized, so we may even need another agenda item for this week if possible to bring this up as a discussion topic and note its resolution.

ljharb commented 5 months ago

As long as there's a robust, cross-realm brand check I can use to know if i indeed have a "module source" object of any flavor, my constraints are satisfied, whatever its shape.

I would hope that there would be a similarly robust mechanism to identify which flavor of "module source" it is, but that's outside the scope of TC39, and up to individual hosts.

ljharb commented 5 months ago

And we definitely don't have brand checks for every builtin

The only builtin instance we don't have brand checks for is Error, and that is a mistake from ES6.

(forgot to post this yesterday, sorry)

lucacasonato commented 5 months ago

I propose we do the following (different from @guybedford's proposal), that I think will still satisfy the brand checking constraints from @ljharb, while removing the internal slot to satisfy @syg's feedback regarding "real subclassing" of the abstract module source object.

This keeps the brand checking behaviour without adding a new method, but removes the internal slot as @syg requested.

ljharb commented 5 months ago

Without an internal slot, how can the toString getter know if the receiver is a module source object?

lucacasonato commented 5 months ago

It calls the host hook on the receiver

ljharb commented 5 months ago

so the host hook would basically just be either an identity function or a noop? I’m not sure why that’s an improvement - literally everything else that’s brand checked has an internal slot - but as long as my constraints are met I’ll accept whatever shape achieves it.

lucacasonato commented 5 months ago

The host hook returns the module source record for a given source. You can think of it like a an external getter function for an internal slot.

ljharb commented 5 months ago

Oh sure, but at that point it’s weird not to just store it as a slot, like everything else :-)

lucacasonato commented 5 months ago

The difference is that all the other internal slots that are brand checked are on ecma262 defined objects, not host defined / user objects.

guybedford commented 5 months ago

It turns out it is possible to implement this non-normatively after all. Initially I worked on a normative approach to introduce a new strong branding function in https://github.com/tc39/proposal-source-phase-imports/pull/61, but then @nicolo-ribaudo suggested the same approach could work to implement it non-normatively, so that is now up in https://github.com/tc39/proposal-source-phase-imports/pull/62.

My preference at this point would therefore be to go for the non-normative approach.

I will be bringing this up in plenary as part of our source phase update. There are some related aspects as we were exactly seeking to define an internal slot for the module record in that new proposal, which we now have through the new host hook, so I think it's a good detour after all.

syg commented 5 months ago

It turns out it is possible to implement this non-normatively after all.

Yes, but see what I wrote above:

"I could implement this as a hook in the concrete implementation and have it be observably the same. But this has some correctness risk down the road. With host hooks, it is clear that here is a point where host can introduce their own things (e.g. modules). But every new method we add that makes use of internal slots obscures that boundary, and I'd have to work out if the existing implementation is still observably equivalent."

I really do not want "provide real subclasses of this base class" as a new host layering boundary. We have host hooks.

ljharb commented 5 months ago

The difference is that all the other internal slots that are brand checked are on ecma262 defined objects, not host defined / user objects.

That’s true - these are the first such objects I’m aware of (not user-defined, tho, surely?)

guybedford commented 5 months ago

@syg I understood that comment to be in the framing of there being internal slots - with the new PR there are no longer any internal slots in use.

The design we end up with is that we have a host object that corresponds to an underlying module record, so we maintain this with the following requirements:

  1. The object returned for the source from a concrete module record must have the AbstractModuleSource.prototype in its chain
  2. This new host hook must be implemented in turn for that object to translate back from this host object to its concrete module record.

No additional hooks, no additional slots - just two hooks: one to get the host object, and one to get from the host object back to its underlying concrete record.

As far as I understand it this meets your feedback, but please let me know if I am still misunderstanding your concern.

The above does seem to align with your next comment though:

Another way to meet the goal is to have host hooks and put constraints on those hooks. I have nothing against a hook that's like "do you behave like a module source?" or constrained like "a host that provides an implementation of hook X must also provide an implementation of Y such that both implementations conform to the following requirements [...]".

syg commented 5 months ago

As far as I understand it this meets your feedback, but please let me know if I am still misunderstanding your concern.

That sounds pretty good to me.

guybedford commented 5 months ago

Thank you for the feedback here, landed in https://github.com/tc39/proposal-source-phase-imports/issues/62.