tc39 / proposal-shadowrealm

ECMAScript Proposal, specs, and reference implementation for Realms
https://tc39.es/proposal-shadowrealm/
1.43k stars 67 forks source link

[explainer] removing the section about censoring stack frames #384

Closed caridy closed 1 year ago

caridy commented 1 year ago

This PR is the result of the consensus achieved during the Dic 1st, 2022 meeting. We are removing the normative change about stack censoring spec'd on https://github.com/tc39/proposal-shadowrealm/pull/372

mhofman commented 1 year ago

FYI @bathos, you had expressed needing this feature, but the committee, mostly the implementers, were unconvinced by the champions use cases. If you have a compelling use case, please share it if you can, but at this point it looks like any kind of stack censoring will not pass.

caridy commented 1 year ago

@michaelficarra recommendation was to try to add the censoring mechanism to the other proposal that is focused on stack. https://github.com/tc39/proposal-error-stacks, which I plan to do. @syg was not opposed to the censoring, he was opposed to the use-case for ShadowRealm because regardless of the stack censoring, ShadowRealm doesn't provide confidentially guarantees. At least that's my understanding of his position.

bathos commented 1 year ago

ShadowRealm doesn't provide confidentially guarantees

What it provides is the usual earlier-evaluated-code-may-have-authority “guarantee” that any other virtualizable built-in API surface does, which is what matters for my case.

If you have a compelling use case, please share it if you can

I am able to continue using detached iframes and/or generators that “atomize” MOP operations which could invoke user code into yielded representations of instructions to execute at the “surface” call site to achieve the behavior (i.e. always unwind the execution context stack before [UC]), so if I’m alone in needing it, I can understand the rationale; it is possible to achieve another way. In the future, the function implementation hiding proposal could provide a lighter solution, but at the cost of losing the ideal “real implementation layer errors get full stack traces” behavior that the latter of those two existing solutions provides today.

michaelficarra commented 1 year ago

In my January 2019 presentation to TC39, I outlined my solution for censoring in error stacks.

This solution would have a registry for functions which should be omitted from stack traces. My suggestion for you during the meeting was just that, if such a facility already existed, it would be easy (and probably less controversial) for you to integrate with that registry.

mhofman commented 1 year ago

I am able to continue using detached iframes

The goal of this proposal was to remove the need for detached iframe approaches, which as you surely know has plenty of limitations (e.g. impossible to remove some globals)

ShadowRealm doesn't provide confidentially guarantees

What it provides is the usual earlier-evaluated-code-may-have-authority “guarantee” that any other virtualizable built-in API surface does, which is what matters for my case.

I believe some delegates are also not convinced by generic virtualizability use cases, so having a more specific examples may be valuable.

In particular, the creator of the ShadowRealm is still in position to synchronously evaluate code inside the newly created ShadowRealm instance to setup the realm as needed, including further virtualize the ShadowRealm constructor to make any virtualization undeniable. By that reasoning it's able to use implementation specific logic to censor stack traces in each realm, and reconstruct full stack traces when crossing the boundary as appropriate for the application. That however requires to put a "callable membrane" around the callable boundary created by the bare ShadowRealm API, which is extremely heavy handed.

In my January 2019 presentation to TC39, I outlined my solution for censoring in error stacks.

This solution would have a registry for functions which should be omitted from stack traces. My suggestion for you during the meeting was just that, if such a facility already existed, it would be easy (and probably less controversial) for you to integrate with that registry.

the function implementation hiding proposal could provide a lighter solution, but at the cost of losing the ideal “real implementation layer errors get full stack traces” behavior that the latter of those two existing solutions provides today.

I am unconvinced that it would be easy to integrate with such a registry, because of the constraints of the callable boundary. How would you register a function which only exists in another realm? Also the use case is usually to have a function censored in another realm, but visible in the stacks of errors created in the function's own realm. Besides doing the manual stitching work described above requiring a membrane, I don't see how you could reconstruct full stack traces.

caridy commented 1 year ago

I am unconvinced that it would be easy to integrate with such a registry

@mhofman the fact that stack is going to be a getter of Error.prototype is sufficient for virtualization regardless of the rest of the censoring process defined by @michaelficarra's proposal.

mhofman commented 1 year ago

Not disputing that, but having a standardized %get Error.prototype.stack% only changes the "use implementation specific logic to censor stack traces" part of my comment. It would do nothing to simplify how complicated it is to create a ShadowRealm that observably behaves like the root realm of another host.

bathos commented 1 year ago

I believe some delegates are also not convinced by generic virtualizability use cases, so having a more specific examples may be valuable.

I figured, but unfortunately the thingie I’ve been working on for the last few years is itself concerned with virtualization & implementing “native” Web IDL APIs in ES (to the extent possible) in a very generic sense (i.e. it’s concerned with enabling those things). It doesn’t map to use cases more specific than “bathos is compelled by ancient spirits & fascination to build things that may or may not happen to accidentally end up being useful” (🥲) so I don’t think I’d be the one to change any minds on this.

bathos commented 1 year ago

That however requires to put a "callable membrane" around the callable boundary created by the bare ShadowRealm API, which is extremely heavy handed.

An update, FWIW: I experimented with approaches that still leveraged ShadowRealm which attempted manual redaction via preparation of the host realm before other code is introduced to it. Both Chrome and FF concatenate raw function names within their stack strings without escaping the characters that would otherwise delimit them, so a structured grammar seemingly can’t exist for either of those stack “languages” — not sure about WebKit though. In Chrome, the CallSite API makes frames distinguishable without parsing, but it seems like provenance can’t be reliably determined from it (or at least it won’t be after a recent bug fix lands — it was leaking objects across the call boundary). Even if reliable parsing were possible, virtualizing %Error% (and in V8, CallSite too) in each object-reachable non-ShadowRealm realm is pretty tricky, especially in V8. To leverage prepareStackTrace without it itself being observable that you’ve done so, you end up needing to virtualize %Error% itself — and it turned out the Error.stackTraceLimit property exhibits unusual behaviors which are only explicable in terms of custom object internal methods (i.e. %Error% is actually an exotic object in V8!*) so reproducting it is only possible with a PEO. Unfortunately, PEO values are distinguishable from ordinary objects via the CallSite API as well as several standardized platform APIs. You can virtualize %Proxy% and the methods that “reveal” PEOs to get around that, but, well, you get the idea — the chain of consequences just keeps going.

So right now, it’s looking like it’s true that the yield-from-generators-at-each-[UC] approach (i.e., unwinding the execution context stack itself when the stack would otherwise be observable; a kind of call boundary, but implemented in ES code) might really be the only viable solution. I may be overlooking some neat one-weird-trick solution hidden out there, though, so if anyone knows of a way ShadowRealm could remain usable for encapsulating the internals of “native” APIs without observable differences from native native APIs after this change, I’d love to hear about it.


* Tangential, but given “Built-in function objects must have the ordinary object behaviour specified in 10.1”, are V8’s exotic %Error% OIM behaviors non-conformant?

mhofman commented 1 year ago

Interesting. I know @erights went through a lot of trouble to tame errors and their stack, especially in v8. We do replace the global Error constructor, but I believe we do so to hide these unspecified powers / features. We are careful to keep using the original %Error.prototype% as that is undeniable (e.g. through try {null.error} catch (e) { console.log(getPrototypeOf(getPrototypeOf(e)) === Error.prototype) }).

I was thinking that adding a wrapper of your own around all boundary crossing would allow you to more easily identity where your own realm stops in stack traces. Hence the membrane like suggestion. That wrapper's name could be dynamically generated and closely held to limit confusions caused by guest code.

I am curious how generalizable your generator like approach is though. Wouldn't guest code running inside the shadow realm be required to similarly support suspension so that calls made that end up going across the callable boundary can unwind the stack?

I'll admit all this is suppositions I'm making. I believe mostly @caridy has experience with virtualizing a host based on ShadowRealm.

bathos commented 1 year ago

Wouldn't guest code running inside the shadow realm be required to similarly support suspension so that calls made that end up going across the callable boundary can unwind the stack?

Yep — but that’s not “guest code” here, at least not in the same sense that subsequently evaluated code outside of it would be. It’s the implementation layer backing exposed Web IDL constructs. Code with access to only those external constructs should not be able to determine what language they were implemented in.

mhofman commented 1 year ago

Ah right I suppose this works because WebIDL APIs always call user code on a clean stack? Aka you won't have a case of user code calling your API which goes back to the root realm for its implementation, and synchronously needs to call back into user code, which would reveal the intermediate frames.

bathos commented 1 year ago

The stack includes the “built-in” function that the consumer has called. Reentry is possible, so there can be more than one of these. However those surface functions drive a generator, allowing internals and [UC] to be interleaved without exposing the former to the latter, ))<>(( style. Browsers have been inconsistent about the circumstances where their own built-in functions get included in stacks, unfortunately, but they’re in agreement that the stacks don’t include traces of the implementation layer.

caridy commented 1 year ago

Ok, I can merge this PR, so we can move forward based on the review from you both. This is not really controversial since it is just removing text about the censoring.