tc39 / proposal-shadowrealm

ECMAScript Proposal, specs, and reference implementation for Realms
https://tc39.es/proposal-shadowrealm/
1.44k stars 67 forks source link

Which properties should HTML add to Realms' global objects? #284

Open littledan opened 3 years ago

littledan commented 3 years ago

HostInitializeUserRealm says,

It is not expected that this hook would add properties to the Realm's global object.

We should consider making this a requirement, not a suggestion, to ensure that the hook is not used in different ways in different environments in a way that could hurt interoperability.

This hook must not add properties to the Realm's global object.

Thanks to @annevk for pointing out that cross-environment guarantees would be useful here.

(Since we're just talking about tweaking wording on something the spec already says, I think the exact choice of wording can be iterated on post-Stage 3.)

leobalter commented 3 years ago

Limiting the now named HostInitializeSyntheticRealm seems fine for me.

Please just be aware the abstract setDefaultGlobalBindings allow additional globals into the Realm and we should extend this discussion if necessary.

littledan commented 3 years ago

This was the biggest concrete actionable piece of feedback that I heard from @syg and @codehag in our discussion at the November 2020 TC39 meeting: that we instead should have Web APIs defined on Realms. We can make some Web APIs available and others not, using WebIDL's [Exposed] extended attribute (if we add a bit of plumbing for Realms). I'd like to discuss this within the people working on Realms, so we can later form a proposal to HTML.

kriskowal commented 3 years ago

I believe this concern requires two solutions.

There are web API’s like TextEncoder that should be available in a JavaScript language Realm, where the web has gone ahead of the language standard as an expedient. The solution here is likely the same as with TypedArrays: these things should graduate to 262 and be incorporated in any Realm. Of course, nothing except self restraint precludes vendors from including them in Realm ahead of a blessing from TC39.

There are web API’s like document and fetch that should be available only in web realms, which could even be called WebRealm to make their capabilities-at-birth clear. While that API should rhyme with Realm, it clearly need not and should not come from 262. And, such a thing could be shimmed using Realm modules off-web.

Jack-Works commented 3 years ago

Agree with the idea of WebRealm. Web can extend a subclass of Realm and in the initialization stage (not the host hook stage) add those Web APIs in.

Jack-Works commented 3 years ago

I have a question. If we're not allowing adding properties on the Realms global object, developers will copy it manually. Does new Realm implements theWindow interface in the WebIDL? If not, developers will need to "bind" the this of those Web APIs, or they will encounter Uncaught TypeError: Illegal invocation like what you do alert.call({}) today.

littledan commented 3 years ago

new Realm().globalThis would implement a new WebIDL interface--not Window, not WorkerGlobalScope, but something analogous. Then, different interfaces will be [Exposed] on it. We could separate Realm and WebRealm and make two of these interfaces, but I'm not really sure where the dividing line would be exactly... I don't think document should be in either, but I wonder if we might want fetch in both.

Jack-Works commented 3 years ago

I think we should not allow host to add new properties in the host hook. They should make a subclass of realms and adding platform API in the constructor.

Realms itself should only have things that defined in the ECMAScript specification.

codehag commented 3 years ago

@Jack-Works this would mean that developers need to know that streams, setTimeout, atob are all not js apis, however many developers consider them to be part of JavaScript. This is why many of us advocate for not splitting js into "web" and "js" -- its worse for developers. Why should users have to be aware of details of specification bodies?

I agree that not all web apis should be exposed by default. For example, setTimeout, but limiting this to only what is in ECMA262 would be an arbitrary decision from the perspective of webdevs.

I agree that WebRealm might be interesting. However when it comes to which apis go where, I don't think we should have this hard limit. I think this will be a discussion in the coming months, and I appreciate dan getting that started.

Jack-Works commented 3 years ago

@codehag if we are adding APIs into Realms, one of its original motivation is broke.

For example if I want to emulate a Node environment in the web by Realm, I need to clean up all those Web APIs, I don't have a clean environment to emulate. Things will be just a little easier then today (using iframe) because there is no unfrogables on it. But if we split them, I can have a clean environment and not going to worry about removing a unwanted APIs.

And the by this approach, normal developers will use WebRealm (or HTMLRealm or whatever it named) and if there is special need (for example emulating other hosts), I can still have a clean Realm to manipulate on.

codehag commented 3 years ago

Which environment do you have in mind? Is it in node? Or do you have another host in mind?

annevk commented 3 years ago

I think the main problem is that JavaScript itself doesn't separate language from library too much either. Why exclude atob(), but not encodeURI()? Would it also not be a problem that anything new added to JavaScript would end up in there? E.g., if setTimeout() were somehow moved. How is that "clean"?

Jack-Works commented 3 years ago

Maybe I didn't represent my idea well. Let me try again.

For example if I want to make a "Node.JS" emulator web app. I need to create a new Realm. If browsers are adding Web APIs into the created realm, I need to do extra work: make a list of ES APIs and remove anything else. If the browser is not adding APIs on it, I can start to shim Node environments without making the list.

And for other usages like running pluggings, they can use the WebRealms instead.

By this way, Realm is a lower level API for I don't know, tooling authors if they want to emulate any other host environment, and host version of realm (WebRealm, NodeRealm, ...) it's a useful tool to create a new global object.

Jack-Works commented 3 years ago

@annevk What I mean "clean" state is not from the perspective of API design (why encodeURI but no atob), but from the usage of the API (I want to get a minimal API set that I can ensure any possible engines will have that, I'm not care what it have in it's API set or not).

littledan commented 3 years ago

I think there is something real here: As much as it's arbitrary, the JavaScript standard does, in practice, for a common base among JS environments. Adoption of Web APIs is ongoing but it's much patchier, and there are greater interop issues in practice for Web APIs than for JS APIs.

You could think of this interop difference as a historical artifact, but it generally follows how JS engines try to implement the JS standard, web browsers and environments that try to be compatible to them implement web standards, and Web JS engines are often retargetable to outside of the Web (meaning there is more code sharing in practice). So it's far from a coincidence, and explains why there is demand for an API at this level.


At the same time, I'm very much sympathetic towards the idea that we should make Realms on the Web more full-featured, reducing sharp edges for developers, and including certain Web APIs there. I wanted to note a few concrete options, besides the JS/Web line, that could make sense for Realms.

One possible line which @annevk mentioned in #whatwg is: APIs which have to do with parsing/string processing are included. So Intl, atob and TextEncoder are in, but then maybe setTimeout, EventTarget and ReadableStream are out. (I'm not sure where WebCrypto falls here...) A lot of people expect setTimeout in JS universally, but that is actually an example of an API with various interop issues in practice, and scheduling is complicated (and might be the kind of thing you want to manipulate somehow in a Realm anyway). I wonder if this line would be intelligible and usable by JS developers in general.

Another possible line that @annevk mentioned would be to keep the set of globals to an absolute minimum. So we would even exclude many JS globals, and just include things which are needed to run JS syntax (like Array.prototype). While this idea makes sense to me in theory, it sounds a bit difficult to work with; it might require the evaluation of large polyfills for methods which create JS objects in the right Realm, slowing startup time.

A third possible line is to include everything which doesn't do "I/O". So HTMLElement, console, localStorage, fetch and postMessage are omitted, but setTimeout, EventTarget, WebCrypto, etc are all present, in addition to atob and TextEncoder. I see this "all but I/O" set as the sort of "maximalist" option.

Jack-Works commented 3 years ago

Normal developers will learn to use WebRealm, it's full-featured, and if they need it, they can use Realm. There is no conflict.

This hook must not add properties to the Realm's global object.

We should add this.

leobalter commented 3 years ago

At the same time, I'm very much sympathetic towards the idea that we should make Realms on the Web more full-featured, reducing sharp edges for developers, and including certain Web APIs there.

I second @littledan here. My pain point is not on shipping Web APIs in general but mostly dealing with the unforgeables. Those are the the main blockers for some virtualization with have within our goals.

My preference is to unblock this proposal and get to the strategy that goes better for the HTML integration. Saying that, I find more interesting to pick something on the edges regarding globals, minimalist (ES Primordials) or working through a good amount of values we can add to avoid less confusion for the users.

In case we go with anything non-minimalist, I can sync with my team - including @caridy - to list anything else to be avoided beyond unforgeables.

Jack-Works commented 3 years ago

mostly dealing with the unforgeables. Those are the the main blockers for some virtualization with have within our goals.

Yes, unforgeables are the main blocker for virtualization. But unknown extra properties added by the host are also a problem with virtualization (even they can be deleted with no difficulty).

littledan commented 3 years ago

One possible line which @annevk mentioned in #whatwg is: APIs which have to do with parsing/string processing are included. So Intl, atob and TextEncoder are in, but then maybe setTimeout, EventTarget and ReadableStream are out. (I'm not sure where WebCrypto falls here...) A lot of people expect setTimeout in JS universally, but that is actually an example of an API with various interop issues in practice, and scheduling is complicated (and might be the kind of thing you want to manipulate somehow in a Realm anyway). I wonder if this line would be intelligible and usable by JS developers in general.

Can I do a temperature check/call for emoji reacts on this option? What do people think of "string parsing" as the possible line? (We might include queueMicrotask here as well, as it's analogous to Promise operations.)

Jack-Works commented 3 years ago

I still want to This hook must not add properties to the Realm's global object. for the reason I have presented. Sorry for repeating but it's somehow important for me.

caridy commented 3 years ago

I still want to This hook must not add properties to the Realm's global object. for the reason I have presented. Sorry for repeating but it's somehow important for me.

@Jack-Works in the past we used the init hook which was a callback used by the super to provide a hook into the list of descriptors to be installed into the global object. That was removed a while ago for various reasons, look at the closed issues. Now, what you're asking, if I understand correctly, is very similar, a way to create a new realm with a global object without any global property defined, giving the option to the developer to populate it as will. Is that it? If yes, the next question is: where to provide the descriptors, if any, for you to populate the global properties? And what descriptors would you need? And how is that different from deleting what you don't need? Assuming everything is configurable.

Jack-Works commented 3 years ago

@caridy yes, I want to forbid to add anything in the host hook.

Host can provide a subclass then define new properties in the constructor steps.

Developers can do this too, or they can directly manipulate the returned value of the Realm.globalThis.

littledan commented 3 years ago

Yeah, I don't see why we need any more hooks here. You can just add properties after the Realm constructor returns. And if we want to permit hosts to add properties, they can do it in HostInitializeSyntheticRealm, not a separate host hook.

Jack-Works commented 3 years ago
assert(new Realm().globalThis.TextEncoder === undefined)

class WebRealm extends Realm {
    constructor() { super()
        this.globalThis.TextEncoder = require('text-encoder')
        // ...
    }
}
littledan commented 3 years ago

We discussed this issue in an SES call recently. Some major points of agreement among participants there were:

Nobody present in the meeting raised concerns with @annevk 's suggestion of just including text-processing-like things, so I think that would be good to move ahead with as a starting point.

It doesn't sound like anyone in the Realm champion group will have time to articulate the HTML/WebIDL-side changes to specify all of the details here by January, but if someone is interested in working on this, please let me know; I would be happy to mentor you.

Jack-Works commented 3 years ago

Nobody raised concerns with @annevk 's suggestion of just including text-processing-like things, so I think that would be good to move ahead with as a starting point.

I did raise concerns in the emulation & toolings. Did not I make my arguments clear? You can see in the threads above. 🤔

littledan commented 3 years ago

Sorry, I mean, no meeting attendees... You made yourself clear above. Edited the above comment. My opinion is, if we can find a set of things which don't have authority and make sense generally, then it's reasonable to go this way. I should speak for myself rather than everybody.

Jack-Works commented 3 years ago

I'm not against the idea of adding some useful tools to it. But why not add it to the host-defined subclass? There might be some tooling authors really want to have a truly clean environment without anything that not in the language itself.

Maybe I can provide a use case:

Sometimes I write libraries that depend on 0 host APIs (even they're shared across multiple platforms), to make sure it can be run on any ES engine. I need to run the test in a clean environment to make sure I (or any contributors) didn't use any of those APIs).

If the host is not adding anything other on the Realms, I can use Realms to achieve that work. But if TextEncoder is added and I accidentally used it, the test cannot find out I used a host-specific API.

littledan commented 3 years ago

I think it's important that, if we add other globals, we document what they are so that they can be supported on a wide set of JS engines compatibly. That's the idea behind the (mothballed) js-shared-interfaces project.

Jack-Works commented 3 years ago

I think it's important that, if we add other globals, we document what they are so that they can be supported on a wide set of JS engines compatibly. That's the idea behind the (mothballed) js-shared-interfaces project.

Yes and I think to split the clean version and the "enhanced" version is important to me.

leobalter commented 3 years ago

@Jack-Works I believe the clean version is a bit impossible as we already add built-ins. A cleanup process is still required for those built-ins.

I also believe if the hook allows new properties to the global, this will match to recognize ECMAScript current allows extensions do the language.

While I understand the ideal is going with the lesser amount of APIs, there is another ideal to make this compatible with different platform realities.

The trade off of allowing properties would be some potential extra cleanup, when necessary on user land. If we work through a subset, we can deal with that.

@littledan: Nobody present in the meeting raised concerns with @annevk 's suggestion of just including text-processing-like things, so I think that would be good to move ahead with as a starting point.

This seems like a fair direction, and I hope we can have one rather than just blockers.

leobalter commented 3 years ago

I just received new information about actually having a smaller API set than the current ES modules. If that's a case that becomes acceptable for the Web Platform, it might also work for clean state arguments as pointed out by @Jack-Works.

@annevk @littledan Do you have any standing out list of APIs we could omit?

I'll sync this with @caridy and see if I can offer a built-in subset for synthetic realms.


@littledan mentioned this here and I missed it. https://github.com/tc39/proposal-realms/issues/284#issuecomment-729251580

annevk commented 3 years ago

Well, Intl, Math(?), and Date come to mind. It's hard to find a discerning principle, but dropping library-like functionality seems like a good start. An extreme would be to only expose what the parser needs to exist.

leobalter commented 3 years ago

I've been trying to set down a list, still missing some global properties.

I have some thoughts over this but I need to properly summarize then another time. For now I wanna see the possible options ahead.

KilianKilmister commented 3 years ago

I want to really stress how important I think it is to have this clean realm without any additional APIs (preferably without Annex B as well, but that's a different topic) so developers have this blank canvas to use if they need it.

I very much agree with @Jack-Works

I'm not against the idea of adding some useful tools to it. But why not add it to the host-defined subclass? There might be some tooling authors really want to have a truly clean environment without anything that not in the language itself.

I don't see any argument against having a host-defined or some spec-defined subclass of that clean base which has extra functionality, so that this Base Realm is still there for developers to choose from.

One use case for Realms as clean as possible would be tooling scripts which would be limited to a single or small set of APIs, allowing for predictable and consistent execution not by goodwill of the developer, but by enforcement by lack of access. I would love to design Web-Worker like global scopes for use with cli-scripts or as better plugin systems. this would make adoption of these tools much easier and require less boilerplating by the user.

I rarely write code for the web, so this is from the NodeJS perspective, where anything like that is currently pretty much impossible (at least from a deployment standpoint), as the work on vm.Module has pretty much stalled, waiting to see how Realms progress.

Jamesernator commented 3 years ago

An alternative mechanism to having subclass realms would be the ability to simply request the host to create objects for it. This would allow Realms to easily include anything they want.

e.g. For example:

const childRealm = new Realm();
// Ask the host to add a local copy to childRealm using childRealm's intrinsics
childRealm.attachHostAPI('Worker');
childRealm.attachHostAPI('console');

// Ask the host to add all host APIs it can to the child realm
childRealm.attachAllHostAPIs();

Another alternative would be to expose all host APIs through builtin modules, this way such APIs could be attained through importing them in the child realm. The importHook machinery in the compartments proposal would allow for denying/allowing access to host APIs as necessary within the child realm.

e.g.:

const childRealm = new Realm();
// Import a host module into the child realm and attach it
const childRealmConsole = await childRealm.import("web:console");
childRealm.globalThis.console = childRealmConsole.namespace;

The downside of this approach compared to the first though is that every API needs to be put into a builtin module, whereas the previous one simply refers to them by the already existing global names.

syg commented 3 years ago

Can someone catch me up on what the current thinking is here? It's still an important thing to figure out before Stage 3.

leobalter commented 3 years ago

@syg I'm strongly in favor to persist with each new realm only having the global names from ECMAScript as a naturally separation pre-defined in a spec that is common among JS engines.

We considered subset of ECMAScript APIs but it didn't end as productive and not worth defining a separation work within ECMAScript itself.

From my perspective, there isn't value to add extra host defined names to the global when Realms is used for virtualization, among its most common use cases. If anything is added, this will require extra customization of the global for virtualized environment, adding extra user land complexity.

Although, I believe we can flex it for the web platform while integrating with HTML. Following @littledan's comment, I believe the following can be added to HostInitializeSyntheticRealm:

  • There shouldn't be any non-configurable properties, so that Realms can be customized
  • Authority should be avoided. Ideally, the only capabilities exposed would be what JS itself already exposes (loading modules, getting the time, queueing promise resolutions, etc)

HostInitializeSyntheticRealm still does not disallow adding names. When the integration happens, if add something we can reflect the host defined text to reflect that it should occur, limited to the the aspects listed above.

domenic commented 3 years ago

IMO anything which includes encodeURIComponent but excludes URL is actively harmful, since the former has bad correctness properties which the latter fixes.

ljharb commented 3 years ago

That seems like a good argument to add URL to ecmascript then?

domenic commented 3 years ago

No.

ljharb commented 3 years ago

I'm not sure how to square those positions. There are many ecmascript engines which don't implement the HTML standard, so it seems like either "including encodeURIComponent but excluding URL isn't actually harmful, or, the harm exists and should be avoided by upstreaming URL.

syg commented 3 years ago

Okay, I'm fine with having a host hook here for embedders to add things that are sensible. The spec should give guidelines as to what APIs it thinks ought to be considered reasonable.

Virtualization is not an important enough use case for the web platform to tradeoff ergonomics and possible confusion for web devs, who by and large (borne out by actual MDN survey data here!) do not understand the separation between the specs. More to the point, they really shouldn't need to. So I must push back against the notion that shipping our standards body org chart just because it's reflected in engines is a sensible thing: it may be easy, but it'll be at a large disservice to our constituency.

erights commented 3 years ago

I'm not sure how to square those positions. There are many ecmascript engines which don't implement the HTML standard, so it seems like either "including encodeURIComponent but excluding URL isn't actually harmful, or, the harm exists and should be avoided by upstreaming URL.

EcmaScript depends on the unicode standard by citing it as of a given version, but without duplicating it or claiming jurisdiction over the definition of unicode itself.

In like manner, for standards like URL or TextDecoder that are defined by other standards bodies, we could decide to have EcmaScript standardize the global variable of that name, including it in EcmaScript, citing specific versions of those other specs as providing the definition of the object found at that variable name. Then these global variables are part of EcmaScript. Their values are also part of EcmaScript, by citation of given versions of external specs.

leobalter commented 3 years ago

The current spec does not prevent URL being added in a further html integration. The hook does not recommend what may be added and my strong request is to avoid non configurable properties to the global.

erights commented 3 years ago

The current spec does not prevent URL being added in a further html integration.

I suggest that we not do any html integration per se. We are simply cherry picking some objects defined by other standards and including them in the EcmaScript standard by citation. They must meet the powerlessness criteria @littledan explains above, must be relevant enough to JS in a host independent manner, e.g., not be about JS in browsers, and must reach tc39 consensus to be included in EcmaScript. Then these are just new standard globals.

The hook does not recommend what may be added

I propose this instead of any host hook. new Realm should add all the EcmaScript defined globals, including we hope URL, TextEncoder, and TextDecoder --- once we have consensus to include these in standard EcmaScript.

and my strong request is to avoid non configurable properties to the global.

Yes. I consider this a requirement.

Jack-Works commented 3 years ago

to tradeoff ergonomics and possible confusion for web devs, who by and large (borne out by actual MDN survey data here!) do not understand the separation between the specs. More to the point, they really shouldn't need to

Then the idea WebRealm extends Realm above still applies. We can provide an ergonomic version of Realm by default but the clean version should be available.

syg commented 3 years ago

I am against the subclassing idea, it also strikes me too much of "shipping our standards body org chart".

ljharb commented 3 years ago

That many people may not know the differences between environments does not mean it’s not critical for those that do to be able to differentiate - it just speaks to what the defaults should be.

Jack-Works commented 3 years ago

This is an advanced API, I think it's OK to require some prior knowledge before using the API.

it also strikes me too much of "shipping our standards body org chart".

I didn't call it WHATWGRealm or W3CRealm🤣. It's named by the platform. I don't see there is anything wrong with WebRealm extends Realm, NodeRealm extends Realm Deno.Realm extends Realm though, and if there is a non-conflict name for each platform with a set of different additional global properties, it will be easier to specify the expectation of what developers want.

Jamesernator commented 3 years ago

This is an advanced API, I think it's OK to require some prior knowledge before using the API.

it also strikes me too much of "shipping our standards body org chart".

I didn't call it WHATWGRealm or W3CRealm🤣. It's named by the platform. I don't see there is anything wrong with WebRealm extends Realm, NodeRealm extends Realm Deno.Realm extends Realm though, and if there is a non-conflict name for each platform with a set of different additional global properties, it will be easier to specify the expectation of what developers want.

The other option is to have a way to specify/add host APIs to a Realm, I don't even necessarily think it would be a bad thing to add all such APIs as a default and have the clean realm as opt-out. The main thing that's important is we can't ship realms and then decide later that host APIs should be added by default breaking previously "safe" realms.

The simplest way would be to simply to provide a list of host APIs to the Realm constructor:

const realm = new Realm({
  addHostAPIs: ["TextDecoder", "URL"],
});

// Safe realm, no host APIs added
const realm = new Realm({ addHostAPIs: [] });

// We'd wanna provide a list of globals so we can also do blacklisting/whitelisting as wanted
const realm = new Realm({ addHostAPIs: Realm.SUPPORTED_HOST_APIS.filter(a => a !== "Worker") });

// Alternatively we could have named sets of APIs, but if this were the case we'd need to
// tag every single host API with a set of tags, which may or may not be feasible
// especially given many host APIs can change over time, adding potentially unsafe things
const realm = new Realm({ addHostAPIs: ["safe", "deterministic"] });

Alternatively, if we want the defaults to be more permissive, such that we could even add new stuff to default Realms that is potentially unsafe for more secure use cases like SES, we could have a separate factory that creates a realm that removes all "unsafe" things, both present and future:

// Creates a Realm that has no host APIs exposed, additionally if we decide to
// add any new things to Realms with unsafe defaults, Realms returned from `Realm.safe` would
// always remove the feature or provide a safe default instead
//
// Examples of things that could be added in future are things like new hooks,
// new global APIs, builtin modules, etc etc
const realm = Realm.safe();