w3ctag / design-reviews

W3C specs and API reviews
Creative Commons Zero v1.0 Universal
332 stars 56 forks source link

Realms API ECMAScript Proposal #542

Closed leobalter closed 3 years ago

leobalter commented 4 years ago

Saluton TAG!

I'm requesting a TAG review of TC39's Realms API.

Realms are a distinct global environment, with its own global object containing its own intrinsics and built-ins. The Realms proposal provides a new mechanism to execute JavaScript code within the context of a new global object and set of JavaScript built-ins. The Realm constructor creates this kind of a new global object.

Further details:

You should also know that...

We'd prefer the TAG provide feedback as (please delete all but the desired option):


CAREFULLY READ AND DELETE CONTENT BELOW THIS LINE BEFORE SUBMITTING

Please preview the issue and check that the links work before submitting.

In particular:

¹ For background, see our explanation of how to write a good explainer. We recommend the explainer to be in Markdown.

² Even for early-stage ideas, a Security and Privacy questionnaire helps us understand potential security and privacy issues and mitigations for your design, and can save us asking redundant questions. See https://www.w3.org/TR/security-privacy-questionnaire/.

littledan commented 4 years ago

This work is being funded by: Salesforce, Agoric

Note that Bloomberg has been funding my (Daniel Ehrenberg, Igalia) work in TC39 generally.

leobalter commented 4 years ago

I updated the post to reflect this, @littledan please let me know if this is fine.

littledan commented 4 years ago

Note, I think this makes sense to classify as a "specification review" rather than an "early review", as we have full semantics and a specification with semantics proposed, modulo editorial issues in HTML.

I hope that we can discuss this proposal at an upcoming TC39 meeting. Getting a TAG review would be very useful prior to that the discussion. There's one TC39 meeting starting September 21st, and another starting November 16th. Please let us know if there's anything we can do to provide information or context to help the TAG.

domenic commented 4 years ago

I don't think it's accurate to classify the issues with HTML as "editorial"; they cut to the heart of the proposal, and as to whether this proposal is something we'd want to welcome on the web at all, and if so, how it would integrate in a way that deeply cross-cuts fundamental pieces of the web architecture around code-loading, security (CSP etc.), and global objects.

littledan commented 4 years ago

We have tried to make clear what is being proposed, and I don't know of any ambiguity about these details. Either way, I'd love to hear more from the TAG and others about these design concerns, c.f. https://github.com/tc39/proposal-realms/issues/238. I have no strong opinion on this issue's labels.

kenchris commented 4 years ago

@plinss @hadleybeeman and I looked at this in the TAG breakout today.

We are generally happy to see this happen, but would like a bit more clarification, like for instance what is exposed to the Realm object? (see below)

Realms differ from same-origin iframes by omitting Web APIs such as the DOM.

So do Realms only have the ECMAScript APIs available?

Doesn't this mean that most libraries won't work unless to add its dependencies manually, like realm.globalThis.fetch = fetch. Like we could see people using this even to isolate WebAssembly code, thought that requires you adding the methods needed for that.

We are also a bit afraid that regular developers will have a hard time understanding all these concepts (realms, globals, this) and how they relate to each other: realms, like what is a realm really, especially since the top-level realm (like the one with window === globalThis) cannot be accessed as a Realm object.

Maybe for consistency sake it would make sense to have an accessor to expose it as a realm, thought currently the only thing exposed is globalThis and import - but we assume that could be extended in the future.

The explainer talks about Compartsments (link returns 404 - https://github.com/tc39/proposal-realms/blob/main) but it would be nice with a quick into to that work and how all of this relates.

leobalter commented 4 years ago

@plinss @hadleybeeman and I looked at this in the TAG breakout today.

Thanks!

We are generally happy to see this happen, but would like a bit more clarification, like for instance what is exposed to the Realm object? (see below)

Realms differ from same-origin iframes by omitting Web APIs such as the DOM.

So do Realms only have the ECMAScript APIs available?

Yes! It only exposes a new copy of the built-ins from ECMAScript, but it allows extensions defined by each host.

// Proposal:

Realm ()

...
11. Perform ? SetDefaultGlobalBindings(O.[[Realm]]).
...
ECMAScript

SetDefaultGlobalBindings ( realmRec )

1. Let global be realmRec.[[GlobalObject]].
2. For each property of the Global Object specified in clause 18, do
...

18 The Global Object

...
- may have host defined properties in addition to the properties defined in this specification. This may include a property whose value is the global object itself.

Doesn't this mean that most libraries won't work unless to add its dependencies manually, like realm.globalThis.fetch = fetch. Like we could see people using this even to isolate WebAssembly code, thought that requires you adding the methods needed for that.

Absolutely, this is equivalent to what happens to Node VM today as a low level API prior art. As a developer you need to setup the environment to execute code.

Ideally the Realms would arrive a clean state, allowing tailoring for what is necessary to be added. This contrasts with the tailoring over unforgeables. e.g. window.top, window.location, etc

Considering all the trade offs, the clean state seems the best option, in our opinion. It allows tailoring for multiple purposes and comprehends more use cases.


We are also a bit afraid that regular developers will have a hard time understanding all these concepts (realms, globals, this) and how they relate to each other: realms, like what is a realm really, especially since the top-level realm (like the one with window === globalThis) cannot be accessed as a Realm object.

Executed code doesn't need to know it's in a realm, this is designed to be a concern for those setting the realm up. Ideally, code executed in a realm would run seamlessly. There is prior art for this (iframes, Workers, node.vm).

Maybe for consistency sake it would make sense to have an accessor to expose it as a realm, thought currently the only thing exposed is globalThis and import - but we assume that could be extended in the future.

The initial Realms proposal had more content and more ways to access things. We tried to build a MVP and hope we can explore expansions of the API in the future.

The explainer talks about Compartsments (link returns 404 - https://github.com/tc39/proposal-realms/blob/main) but it would be nice with a quick into to that work and how all of this relates.

Thanks for catching that up! The correct link is here: https://github.com/tc39/proposal-compartments.

Compartments is a more complex API that offers tailoring over aspects beyond the global APIs but with modifications to internal structure such as module graph. The Realms API just offers immediate access to what is already specified in ECMAScript as it's already structured to distinguish different references from realms.

I'm looking forward to continue this conversation. Thanks for the feedback!

littledan commented 4 years ago

One thing to understand here is that Realms are generally intended to be a sort of metaprogramming construct, which would be used by frameworks and libraries to build emulated JS environments for developers. I understand the feedback that this concept may be difficult for JS developers to understand; probably an introduction in the explainer to show how múltiple globals in JS already work would help make this document more accessible. Either way, it is an underlying primitive in the platform.

Maybe for consistency sake it would make sense to have an accessor to expose it as a realm, thought currently the only thing exposed is globalThis and import - but we assume that could be extended in the future.

Trying to understand this comment--what would this accessor be? What does "it" refer to--are you suggesting making an accessor named window? Just to explain the design of this API, the idea is that everything in the global object hangs off of globalThis. I don't understand the function of adding synonyms; the API without an alias seems learnable to me.

Does the TAG have any thoughts about https://github.com/tc39/proposal-realms/issues/238 ?

leobalter commented 4 years ago

I copied my answers above to the current explainer.

I'd be happy to set more additional topics there or expand anything there, please just let me know.

leobalter commented 4 years ago

Pinging @torgo and @plinss, Hi!

I'd appreciate a lot if we could have a follow up before the next TC39 meeting in November 15th 2020, is there any chance to fit this into the schedule?

I know we have TPAC coming in and I understand all the time constraints, so I understand if we time limits and constraints here.

Thanks!

littledan commented 3 years ago

Note that this proposal is on the TC39 agenda to be discussed next week for Stage 3. Further TAG feedback would be welcome, especially if you're available to review before the meeting. I've also iterated on the HTML integration proposal at https://github.com/whatwg/html/pull/5339 to try to resolve the issues that @domenic mentioned.

kenchris commented 3 years ago

@leobalter

may have host defined properties in addition to the properties defined in this specification.

Could you give any examples of what host extensions you expect hosts to add? I assume a host here could be something like node.js or the browser.

dbaron commented 3 years ago

@domenic I'd be curious if there's a summary somewhere of your current set of concerns with the proposal. Seems like you had a clear list of concerns back in March, but it's not clear how many of them are current. Is it still the last one there that's the major issue?

atanassov commented 3 years ago

@leobalter or @littledan what are the new communication channels that will be possible with having this new capability? Having read the explainer and security questionnaire I couldn't find a clear answer if they will be less or more compared to what is available today. For example, could some nested realm that happen to be cross origin be able to leak/get information from the top level document that's not possible today? Also, if I create a real in one scope with mutation observers etc. and pass it to another realm could that become leaky? Again, I'm sure this is probably already answered somewhere but it wasn't obvious or easy for understand. Any pointers appreciated.

domenic commented 3 years ago

@domenic I'd be curious if there's a summary somewhere of your current set of concerns with the proposal. Seems like you had a clear list of concerns back in March, but it's not clear how many of them are current. Is it still the last one there that's the major issue?

Thanks for the ping. Currently my concerns, in order of largest to smallest, are:

  1. Realms encourage buggy and insecure application architecture.

    Realms allow code to run in a "sandbox", but that sandbox is insecure. I mean this in the sense that it has no Spectre protections, or protections against the various arbitrary-write memory safety bugs that every browser continues to exhibit on a frequent basis. Many people (e.g. on the realms issue tracker) have the impression that realms can be used for cases like running non-audited third-party plugin code in the same process as user data, which is a bad idea. To the extent realms enable such folly, they should not be added to the platform.

    And even if you're just looking for integrity protections (of the sort given by weak maps/private fields/closures), not security ones, it's extremely hard to use realms to gain those protections in a non-buggy way. (Example: https://github.com/tc39/proposal-realms/issues/277. More prominent example: https://www.figma.com/blog/an-update-on-plugin-security/.) Experts can successfully achieve integrity in this way, by using complicated systems that go by names like "SES" and "near-membranes". But experts can already achieve these protections with realms polyfills and similar technologies. We should not incorporate something that is a footgun-by-default into the platform, because it allows those experts to ship a little less code; those experts should instead just continue doing what they're doing currently.

  2. The availability of realms nudges applications away from isolated architectures.

    The overarching trend of web architecture has been toward more isolation where possible:

    • Process-level isolation via site isolation, origin isolation, and cross-origin isolation (necessary for security)
    • Thread-level isolation via workers and worklets (very helpful for performance)
    • Isolation of even same-origin iframes from each other via disallowdocumentaccess
    • Ensuring that new features like portals or prerendering are always isolated.

    Indeed, if we had designed the web from scratch, we would have made synchronous cross-realm access impossible. See @annevk's comment in https://github.com/tc39/proposal-realms/issues/238#issuecomment-597005198 for more.

    To the extent that application developers use realms in preference to workers/worklets/isolated iframes/etc., they are moving the web in the wrong direction. And to the extent that browser vendors invest in realms technology instead of improving the former technologies to meet more use cases, they are doing the same.

    I am especially worried about this in the sense of realms being an attractive footgun, e.g. some authors believe (https://github.com/tc39/proposal-realms/issues/219#issuecomment-652549073) that realms will give parallelism, despite that not being the case.

  3. Realms segregate "JS" and the "web platform".

    Realms segregate the "primordials", which are globals from the JS spec such as Array, Map, Promise, and encodeURIComponent, from web platform globals such as Document, URL, TextEncoder, AbortController, and setTimeout. For the first time, they directly expose this difference to web developers, by allowing them to create new realms with only JS-spec primordials.

    This division is something we've tried hard to avoid exposing to web developers, as part of the idea that the web platform is unified, and not segregated by which standard body does the work.

  4. Realms encourage code injection

    Realms are fundamentally designed around encouraging code injection via realm.eval(). In an ideal web, eval and eval-like structures would not be present, so introducing a major new code evaluation vector as the primary entry point into an API is not great.

    Realms also allow loading of scripts from URLs, but this brings us to our last point...

  5. Realms bring significant technical and specification complexity. (Last concern becuase of priority of constituencies.)

    Realms change the very oldest of web platform/JavaScript engine integration points, namely the realm and global object relationship. Many web platform behaviors key off of realms. See e.g. some discussion in https://github.com/whatwg/html/pull/5339#issuecomment-723272584.

    The majority of these are fixable, e.g. https://github.com/whatwg/html/pull/6137 looks to be in the right direction of being less intrusive. (Yet, HTML is only one of many web specs that would need updating.)

    However a major point of contention remains around the integration with the web platform's module system (tc39/proposal-realms#261). In current implementations the module map is tied to "real" realms, which come with associated security principals, fetch clients, HTTP cache partitions, etc. These "synthetic" realms want to create their own module maps, which will require significant rearchitecting, in both spec and implementation.

    Finally, I don't know whether the very-high specification complexity of realms will also transfer over into a very-high implementation complexity. However, I suspect that the implementation changes will be more security-sensitive, given how often realms are used in security decisions in the Chromium codebase at least.

caridy commented 3 years ago

hey @domenic thanks for reinstating your concerns, I will try to do my best to address them:

  1. Realms encourage buggy and insecure application architecture.

We (the champions) have been very clear about this for a long time, Realms are not a security boundary. If you want a security boundary, you go async, where Realms can be complementary if you decide to slice and dice the evaluation of code inside that process. We have addressed this concern to make sure that when we use the term "3rd party code" we say "trusted code".

This proposal is simply trying to formalize something that is available across all platforms that are implementing the language. Which in many of them, it is extremely difficult. Let me list them here yet one more time:

In each of them, it is harder and harder to achieve the same, for no particular reason. This proposal attempts to normalize this across the board, and fix the drawbacks of using the same domain iframe. For these reasons, I will disregard this concern as subjective.

  1. The availability of realms nudges applications away from isolated architectures.

As we have debated in the past, going async is not an option for an architecture that attempt to provide any virtualization between trusted sides in a language that is primarily sync (a good example here is Google AMP DOM virtualization project). As I mentioned above, we see Realms as a complement of the architectures that you listed above.

  1. Realms segregate "JS" and the "web platform".

This concern is a solid concern, it is not our place at TC39 to dictate what the web platform should do, or not. My personal opinion is that this ship has sailed a long time ago with the surge of nodejs, and the respective native platforms exposing V8 and JSCore to developers. The de-facto distribution model for JS code via NPM already highlight this issue extensibly, and developers, and more important, the tools available for developers, have helped to mitigate this in some extend.

  1. Realms encourage code injection

This is not accurate, you do not have to enable eval to use Realms, nor it will encourage to do so. How will a program (running on the web) evaluates another piece of code? what are the available mechanisms to do so? You have script injection (considered legacy at this point), dynamic import, and eval. In a Realm, you will have a subset of that, you have dynamic import (via realmObj.import()) or eval (via realmObj.globalThis.eval()). Basically, this proposal will not require the developer to change their application configuration to evaluate code, it will seat on top of the existing application settings.

  1. Realms bring significant technical and specification complexity. (Last concern becuase of priority of constituencies.)

I'm not an expert on this subject, but this has been extensibly debated by other folks, and as far as I can tell, they believe this is not as complex as you think it is. I will let others to counter this argument.

littledan commented 3 years ago

To clarify about hosts adding properties to Realms' global objects: this is currently not planned for HTML or Node.js. The specification recommends against adding properties, and actually we're considering prohibiting it, in https://github.com/tc39/proposal-realms/issues/284. Instead, Realms contain only the JavaScript built-ins, but you can add more properties from JavaScript code.

domenic commented 3 years ago

We (the champions) have been very clear about this for a long time, Realms are not a security boundary.

You can try to be clear about it, but it's not working. E.g. there is a separate proposal, titled "Secure ECMAScript", which uses realms as the basis of its "security". Or there are people trying to use realms for security boundaries, and getting burned, as seen in e.g. https://www.figma.com/blog/an-update-on-plugin-security/ . If a feature encourages writing insecure code, you can't just say "but we told you not to write insecure code" and use that as justification for adding it to the platform anyway.

That is why I think that people who want integrity via multiple globals should continue to use the power tools that are available in their environments, and should not get support from this footgun-laden API being baked into the platform.

In each of them, it is harder and harder to achieve the same, for no particular reason.

I strongly disagree with this. It is "hard" (e.g., environment-specific) for very good reasons, which I've listed above.

For these reasons, I will disregard this concern as subjective.

This concern is my strongest one, and certainly not subjective. Adding something which encourages buggy and insecure code to the language---not just the V8 API, or the Node.js vm module power-toolset, but the language itself---is a big deal, and has serious impacts on web architecture, which is this group's remit.

jcc10 commented 3 years ago

Adding something which encourages buggy and insecure code to the language---not just the V8 API, or the Node.js vm module power-toolset, but the language itself---is a big deal, and has serious impacts on web architecture, which is this group's remit.

I would like to make the counterpoint that at the least it would be unified "buggy and insecure code" that could be built upon and is a improvement from current hacks to achieve the same goal which people are failing at anyway.

Now I am not a expert on this proposal (and could someone tag me with a correction if I am wrong.) but it seems to achieve similar results to what is described in this post. As can be seen in the post it is a mess that most people (including myself) don't actually understand. Does it make a eval that is more secure? Yes. But if only 1% of programmers use it it is useless.

It doesn't matter if Realms is only partially secure if people are using a i-frame, (or just a normal eval,) anyway. Using this at least makes it clear what and why the code is working in the way it's working. As I understand it part of this groups remit is to improve readability. The solution may be as simple as renaming the stupid thing to make is sound less like a sandbox.

Realms allow code to run in a "sandbox", but that sandbox is insecure. I mean this in the sense that it has no Spectre protections, or protections against the various arbitrary-write memory safety bugs that every browser continues to exhibit on a frequent basis. Many people (e.g. on the realms issue tracker) have the impression that realms can be used for cases like running non-audited third-party plugin code in the same process as user data, which is a bad idea. To the extent realms enable such folly, they should not be added to the platform.

If I understand correctly, using realms in a web-worker would solve many of these issues (oh, and you can't have i-frames in web workers, but you could have a realm in a web worker). Additionally, there are varying levels of bad idea, there is "running plugins in a bank app" all the way down to "Modding a single player web game".

leobalter commented 3 years ago

FWIW, I'm currently attending the TC39 meetings with a distant timezone and I plan to come back here tomorrow to answer any question not yet addressed and/or comment the current topics.

littledan commented 3 years ago

@atanassov

@leobalter or @littledan what are the new communication channels that will be possible with having this new capability? Having read the explainer and security questionnaire I couldn't find a clear answer if they will be less or more compared to what is available today. For example, could some nested realm that happen to be cross origin be able to leak/get information from the top level document that's not possible today? Also, if I create a real in one scope with mutation observers etc. and pass it to another realm could that become leaky? Again, I'm sure this is probably already answered somewhere but it wasn't obvious or easy for understand. Any pointers appreciated.

Realms do not create new communication channels. A cross-origin Realm is not very useful--it acts generally like any other sort of cross-origin object (you can't call functions) and does not expose new information from documents. This isn't really documented anywhere since it's not a meaningful use case, and falls out of the rest of the semantics.

I'm having trouble understanding the MutationObserver issue better. A Realm will keep its parent document alive, but I'm not sure what you mean by "in one scope with mutation observers". What is the leak you're concerned about?

littledan commented 3 years ago

You can try to be clear about it, but it's not working. E.g. there is a separate proposal, titled "Secure ECMAScript", which uses realms as the basis of its "security". Or there are people trying to use realms for security boundaries, and getting burned, as seen in e.g. https://www.figma.com/blog/an-update-on-plugin-security/ . If a feature encourages writing insecure code, you can't just say "but we told you not to write insecure code" and use that as justification for adding it to the platform anyway.

If we refrained from adding features to JavaScript because people made "security" claims about them, we wouldn't have Promises or private fields. People also make security claims about older features like lexical scope/closures. I think this whole security perception issue is really a matter of developer education. The champion group has expressed openness to renaming the proposal, if anyone has ideas for a way to more clearly explain the concept.

littledan commented 3 years ago
  1. Realms segregate "JS" and the "web platform"

I've started a discussion about which Web APIs to expose from Realms at https://github.com/tc39/proposal-realms/issues/284#issuecomment-728964261 . Ultimately, this can be another place where we decide which subset of interfaces are exposed, like Window vs Worker. The intention is to have some preliminary discussion here, and then once we have a concrete idea, move to proposing this in HTML.

domenic commented 3 years ago

Hi TAG,

I wanted to let you know I've posted a proposed modification to the realms proposal at https://github.com/tc39/proposal-realms/issues/289 which addresses some, but not all, of my concerns. As I said there,

I'm optimistic that this proposal removes the most dangerous feature of realms, which is that they advertise themselves as an encapsulation mechanism, but it is extremely easy to shoot oneself in the foot and break encapsulation. This encapsulated-by-default proposal would bring realms onto the same footing as other encapsulation proposals such as trusted types or private fields, and thus make it more congruent with web platform goals.

There still remains a danger with people over-using realms when they need security or performance isolation, beyond just encapsulation. This still weighs heavily on me, and its conflict with the direction the web is going (per https://github.com/tc39/proposal-realms/issues/238) makes me still prefer not providing a realms API at all, in order to avoid such abuse. But I recognize there are cases where synchronous access to another computation environment is valuable, and I think if we curtailed the footgun-by-default nature of realms by prohibiting direct cross-realm object access, I could make peace with the proposal.

The Chromium project would be interested in TAG's take on how to weigh these three alternatives, of the current realms proposal, my proposed middle ground of isolated version with sync message passing, and my preferred version of no realms API.

Thanks for your time!

leobalter commented 3 years ago

I'd like to request TAG to hold on the discussion, please! We are still discussing @domenic's proposed modification to consider how to make it work for our use cases or possible incompatibilities, if they exist. I'm glad Domenic's document gives us some room to explore, but we need more time to discuss it internally.

You might benefit of a better use of time in case we eventually migrate to the giving modification, presenting a path to compatibility. Otherwise, we should document identified dealbreakers for the proposal.

leobalter commented 3 years ago

Hello!

It's been a while but we finally worked on top of @domenic's proposal and adjusted our Realms proposal that avoids direct access to objects while enabling a Callable Boundary between realms.

This Callable Boundary API enables implementation of membranes frameworks as a good layering for virtualization. We have a proof of concept here.

We've been iterating over this new updated API within weekly public SES meetings, where it received a very positive feedback. We also presented it to the TC39 Plenary where we received a general positive feedback (meeting notes pending publication).

The slides are here and I intend to record a presentation for future usage. The new rendered spec text is here. The link for the explainer, remains the same, but it's now updated to match the new API.

I'd like to present it again to TC39 in May 25th requesting advancement to Stage 3. For that, I don't have plans to make many essential changes to the API.

There are some open points I'm interested to discuss and I'd appreciate feedback:

Challenge: Global names

The Realm global should only include a new set of intrinsics listed in ECMAScript as the global names.

After analysis, I'm strongly arguing against shortening this list. This means we should shorten the list of global names describe in ECMAScript. FWIW, there is already a strong pushback from TC39 to make this special subsetting in ECMAScript.

The other way around would be introducing more names global list, to be defined by the host. I don't think it's useful, as any flexibility would end up requiring most customization for virtualization settings to verify and replace or remove host defined APIs. The environment virtualization is among the main use cases for Realms, and IMO there isn't a specific line between the ECMAScript intrinsics and the names of a Window proxy object. The ES intrinsics set remains the common point between any hosts, including those that are not part of the Web Platform.

Challenge: Separate Module Graph

Each new Realm should have a separate module graph. For the same reason the API will not provide any access to non-primitive values, we believe it's very important to not leak global values through realms using modules.

A shared module graph would potentially introduce a place to leak identity discontinuity we are trying to avoid, as the Realms essentially have a separate set of globals.

Naming (Bikeshed, mostly)

plinss commented 3 years ago

@kenchris @LeaVerou and I looked at this during a breakout this week and we have some questions (I have some follow-on questions I'll post in a separate comment): 1) our understanding of importValue is that the second argument names an exported value from the imported module, and it's that exported entity that is returned. Is this correct? a) If so, how would one get the default export of a module? b) Presumably if one wants multiple exports from a module, they'd need to call importValue once for each export. Hopefully the module is only actually imported once? c) Have you considered a mechanism where multiple exports can be retrieved in one call? e.g. const [foo, bar] = realm.importValues('module.js', ['foo', 'bar']);

2) The mechanism for how values are passed back and forth seem unclear from our reading. There are mentions of transferrables, but presumable non-transferrable objects are copied in each direction? a) if objects are copied, this leads to more questions, are they individual copies for every invocation? e.g.: const doSomething = realm.importValue('module.js', 'doSomething); doSomething(someObject, someObject); does the doSomething function get two different copies of someObject? b) If the doSomething() function above returns some object that's state local to the Realm, does each return create a new copy of that object?

We didn't have the time to delve into compartments properly, but what's the functional difference between a Realm and a Compartment under the new proposal?

plinss commented 3 years ago

Follow on personal questions about state management and crossing Realm boundaries.

I understand that if importValue returns a callable then the callable gets wrapped in a function that marshals the arguments/return value across the Realm boundary, but does that get recursive? e.g. if when invoked, the callable returns another callable, does that get wrapped too? Does the secondary callable have access to other state in the Realm? How does this work with closures?

Here's an example, consider the following (off the cuff, untested) code being imported into a Realm:

export function doSomething() {
  const x = { y: 0 };
  return function() { x.y = x.y + 1; return x.y; }
}

I then do:

const doSomething = realm.importValue('module.js', 'doSomething');
const increment = doSomething();
console.log(increment(), increment());

What gets logged? '1 1', '1 2', or ?

plinss commented 3 years ago

Also, I've had a use-case for Realms for several years now and I'm not sure the current approach still allows me to do what I want to do...

A while back I designed an extension to Home Documents for HTTP APIs that adds JS function bindings to HTTP API endpoints. The idea is that a browser can load the document describing the HTTP API, auto-generate a class that implements the API as JS methods, and return an instance of that class ( a 'remote object'). The consumer of the API just sees a regular JS object, whose implementation comes from the server (each method returns a promise that does a Fetch under the hood as defined by the Home Document). The Home Document can also carry internal state exposed as properties on the remote object (in addition to private state), and that state can be manipulated by the HTTP API by returning a JSON-Patch. I also built a polyfill that implements this (this repo is a bit out of date, I've been using a more recent version in production code, but it gets the idea across): https://github.com/plinss/remote-web-objects (the live-demo is no longer online)

One feature I wanted to add is to allow the Home Document to also carry raw JS code that implements synchronous methods run entirely client side. That code should be restricted to interact with the remote object and nothing else. I was planning on being able to create a Realm that's scoped to the remote object. Doing this would require the ability to share that object's internal state between the code running in the realm and the Realm's parent. It's not clear if this kind of thing can still be built with the current proposal.

leobalter commented 3 years ago

Thank you so much for the review, @kenchris, @LeaVerou, and @plinss!

1.a) If so, how would one get the default export of a module?

await importValue('./file.js', 'default');

1.b) Presumably if one wants multiple exports from a module, they'd need to call importValue once for each export. Hopefully the module is only actually imported once?

We had this question in other channels. At first glance, we think about setting much of the control directly from Incubator Realm to Child Realm, but the modules injection can also be controlled with a module in between.

The example below uses ./inside-code.js as this control module to load many bindings from the test-runner module.

// ./inside-code.js
import { start, getTapReport } from 'test-runner';
import './test-file.js';
export default function(cb) {
  start()
    .then(getTapReport)
    .then(report => cb(report.toString()));
}
// ./main.js
const r = new Realm();
const log = console.log.bind(console);

const runTests = await r.importValue('./inside-code', 'default');

runTests(log);

It's good to note that, anyway, consecutive await importValue calls with the same specifier would reuse the values cached in the module graph.

1.c) Have you considered a mechanism where multiple exports can be retrieved in one call? e.g. const [foo, bar] = realm.importValues('module.js', ['foo', 'bar']);

Yes! If we don't have a full usage picture, that's our initial intuition. Although, this demands more implementation details I was looking for in importValue as a low-level code and it's also possible in user land:

// ./main.js
const r = new Realm();

async function importValues(realm, specifier, bindingList) {
  return Promise.all([].map.call(bindingList, bindingName => realm.importValue(specifier, bindingName)));
}

const [ padLeft, padRight ] = await importValues(r, './str-tools.js', ['padLeft', 'padRight']);

I'll answer the next questions in follow up comments here.

leobalter commented 3 years ago
  1. The mechanism for how values are passed back and forth seem unclear from our reading. There are mentions of transferrables, but presumable non-transferrable objects are copied in each direction?

Only primitive values are fully transferable. Any try to transfer Non-callable objects will cause an abrupt completion (thrown exception). Callable Objects are internally wrapped into a new exoctic callable, called "Wrapped Function Exotic Object".

That means, if the importValue or the result of evaluate returns a callable object (functions, proxied functions, arrow functions, etc), this callable object is wrapped into this new exotic. The same happens if a wrapped function returns any callable object.

This means, if I have incubator Realm A, and child Realm B, and I inside A I run B.evaluate('x => x * 2'), this evaluation will create a new Wrapped Function Exotic Object in Realm A.

When Realm A calls this new exotic object, it synchronously calls the evaluated arrow function from B, captures the return value, and return it in Realm A.

// Realm A
const B = new Realm();

// Realm B creates an arrow function and returns it
// fn is a Wrapped Exotic object that as an internal [[Wrapped]] containing the arrow function
const fn = B.evaluate('x => x * 2');

// This call will internally call fn.[[Wrapped]](3). The result is a primitive, return it.
fn(3); // 6

There is no function unwrapping in user code, and this is a hard requirement to avoid leaking identities.

This means each time I evaluate something that returns the same callable, I always receive a new Wrapped Function Exotic Object.

// Realm A
const B = new Realm();

B.evaluate('globalThis.fn = x => x * 2');

B.evaluate('fn === fn'); // true

const wrapped = B.evaluate('fn');
const wrappedAgain = B.evaluate('fn');

console.log(wrapped === wrappedAgain); // false

wrapped(3); // 6
wrappedAgain(3); // 6, they are both callables from Realm A connected to the same function in Realm B

This also means we wrap functions the other way around, with no identification:

// Realm A
const B = new Realm();

B.evaluate('globalThis.fn = x => x * 2');

const wrapped = B.evaluate('fn');

const compare = B.evaluate('callable => callable === fn');
compare(wrapped); // false

const verify = B.evaluate('callable => callable(7)');
verify(wrapped); // 14

In the example above, compare sends wrapped to Realm B, but wrapped is once again wrapped as an internal of a new Wrapped Function Exotic Object inside Realm B. It's not unwrapped identifying fn.

The verify sends wrapped to be executed in B and capture it's value. Realm B receives it with the name callable and call with the argument 7, returning it back to Realm A's verify.

leobalter commented 3 years ago

Also, I've had a use-case for Realms for several years now and I'm not sure the current approach still allows me to do what I want to do...

A while back I designed an extension to Home Documents for HTTP APIs that adds JS function bindings to HTTP API endpoints. The idea is that a browser can load the document describing the HTTP API, auto-generate a class that implements the API as JS methods, and return an instance of that class ( a 'remote object'). The consumer of the API just sees a regular JS object, whose implementation comes from the server (each method returns a promise that does a Fetch under the hood as defined by the Home Document). The Home Document can also carry internal state exposed as properties on the remote object (in addition to private state), and that state can be manipulated by the HTTP API by returning a JSON-Patch. I also built a polyfill that implements this (this repo is a bit out of date, I've been using a more recent version in production code, but it gets the idea across): https://github.com/plinss/remote-web-objects (the live-demo is no longer online)

One feature I wanted to add is to allow the Home Document to also carry raw JS code that implements synchronous methods run entirely client side. That code should be restricted to interact with the remote object and nothing else. I was planning on being able to create a Realm that's scoped to the remote object. Doing this would require the ability to share that object's internal state between the code running in the realm and the Realm's parent. It's not clear if this kind of thing can still be built with the current proposal.

You're not alone. This functionality reflects what we've been trying to push forward for so long until we had to find an alternative with this current callable boundary API that still resolves most use cases, but - as you point out - not all of them.

It is hard for me to tell you that you could try using a membrane framework that gives a better sense of object identities and injection, but I feel this has a very steep learning curve and cost for initial implementation. The membranes systems works for us in this callable boundary realms API and many other orgs and projects already using membranes.

Unfortunately, we faced pushback - as you can even find in this thread - about giving object access cross realms. Although, the object access already exist today in the web platform through iframes. I believe @gwhitworth and @caridy might wanna say more about this.

caridy commented 3 years ago

1.b) Presumably if one wants multiple exports from a module, they'd need to call importValue once for each export. Hopefully the module is only actually imported once?

Yes, this is analogous to import, which means you can call that method multiple times for the same specifier, and you get the same module. And yes, you can use Promise.all, etc. to try to construct the object in the incubator realm that contains access to various exported values.

  1. The mechanism for how values are passed back and forth seem unclear from our reading. There are mentions of transferrables, but presumable non-transferrable objects are copied in each direction?

As @leobalter mentioned, it is a hard stop, throwing.

a) if objects are copied, this leads to more questions, are they individual copies for every invocation? e.g.: const doSomething = realm.importValue('module.js', 'doSomething); doSomething(someObject, someObject); does the doSomething function get two different copies of someObject?

Yes, you get two different exotic objects, both bound to someObject on the other side.

b) If the doSomething() function above returns some object that's state local to the Realm, does each return create a new copy of that object?

You can't return an object, that will throw, but if you return another callable, yes, every time you return that ref, the other side gets a new exotic object. Basically there is no identity preserving semantics here, that can happen in user-land with a fancy membrane. Another good example here is when you pass a function to the other side, and the other side calls you back with the same reference that they have received, you get a new wrapped exotic object. Double wrapping can occur at any given time, while implementers will be able to optimize this to avoid going over multiple jumps to evaluate the target function.

Here's an example, consider the following (off the cuff, untested) code being imported into a Realm: What gets logged? '1 1', '1 2', or ?

It will log 1 2, the exotic object that wraps the doSomething gets invoked twice, which means doSomething gets invoked twice, the closure and everything works the same like if you were accessing doSomething from across realms in an iframe scenario.

plinss commented 3 years ago

Thanks for all the (quick!) feedback. I'm going to have to think this through some more.

One more quick question, what if the callable returned from a realm is a constructor? e.g. what happens if have:

// module.js
export function Foo(one) {
    this.one = one;
    this.log = function(thing) {
        console.log(this.one, thing);
    };
}

and I do:

const Foo = realm.importValue('module.js', 'Foo');
const f = new Foo(1);
f.log(2);
leobalter commented 3 years ago

In this example, new Foo(1) will throw a TypeError because it would return an object. The API code just checks for an existing [[Call]] internal and that means the Realms can still receive a constructor, but won't be able to fully use it, like Array as in:

const rArray = realm.evaluate('Array');
try {
  rArray(); // would return a new Array from the other realm
} catch(e) {
  e.constructor === TypeError;
}

This happens because the low level code abstraction just observes the existence of a [[Call]] internal in the given object, and it does not create distinction for special cases.

There is a curiosity for this case using new Foo(1). Here, Foo is a wrapped exotic in the main realm and it doesn't have a [[Construct]] internal. The new Expression will throw a TypeError before even internally accessing the wrapped function because of the exotic is not a valid constructor.

leobalter commented 3 years ago

Adding more for the curiosity, if you call this specific Foo without new, it would fail at this.one = one because you've got a module code (strict mode always on) and so this is undefined.

const Foo = realm.importValue('module.js', 'Foo');
try {
  Foo();
} catch(e) {
  e.constructor === TypeError;
}
plinss commented 3 years ago

Thanks, I suspected it would fail, I was mostly curious how...

LeaVerou commented 3 years ago

@leobalter Thank you for the quick responses! How does one import named (or default!) exports with different names than the ones they were exported with? That's super common with modules. Also, how can someone do the equivalent of import * as foo from ... with this? Needing to repeat the import URL/specifier for the extremely common case of importing more than 1 export is ...not ideal. I think the importValue() API needs some work…

leobalter commented 3 years ago

Hi @LeaVerou!

How does one import named (or default!) exports with different names than the ones they were exported with?

In this case you just import the value contained from a binding name.

const someDifferentName = await r.importValue('./file.js', 'originalName');

// roughly equivalent to:
  import { originalName as someDifferentName } from './file.js';

Also, how can someone do the equivalent of import * as foo from ... with this? Needing to repeat the import URL/specifier for the extremely common case of importing more than 1 export is ...not ideal. I think the importValue() API needs some work…

As pointed out in the last example in this comment, user land code would need to serialize multiple names.

I fully understand the concern and that's something we considered. Although, there are some differences for operations between realms.

First, for import * as foo from 'module', this would need to import a module namespace. In this current callable boundary API, we can't have access to objects cross realms. So we would need to set internal code to clone an object structure for this module namespace, and this would need to go through some overkill internal checkups.

For most of the Realms use cases, relating to controlled execution of a code inside a child realm, you can rely on communication channels which can translate to a few functions.

The other ideal, which I believe match your concerns, would be a direct access to the imported module namespace. That's actually part of the previous and "original" Realms proposal, where we had a Realm.prototype.import('specifierString'). Unfortunately, this is a constraint for implementers and we couldn't move the proposal ahead this way.

caridy commented 3 years ago

Just to make sure we get this point across, the goal is not to allow you to do all the things we do with dynamic import, the goal is to provide a low level api that allow you to implement such behavior in user-land. A good example of it was discussed in the last tc39 about the enumerability of the exported names, and how a program might want to iterate over them, and how you could do the same with this Realm API. Our answer to those questions is always the same:

You will be able to achieve that with a little bit of preparation work if you know what you're importing inside the realm, in principle you can do that by creating a wrapping module that does the work, but in the future, proposals like module blocks will provide a lot of flexibility to describe this behavior inline.

hober commented 3 years ago

Hi all,

Our understanding is that TC39 is contemplating taking up an alternative proposal, in which the only method of cross-Realm communication is via callables. We'd like to review whichever alternative wins out. Please do come back to us when a decision has been reached & we'll take another look. We hope that whichever alternative wins out, the result is easy for developers to work with.

ljharb commented 3 years ago

@hober it would likely help us choose an alternative to get the TAG's opinions on one versus the other.