postMessage requires synchronous possibly-cross-process access

bzbarsky commented 6 years ago

Step 2 of https://html.spec.whatwg.org/multipage/web-messaging.html#dom-window-postmessage requires synchronous possibly-cross-process access, as far as I can see.

In practice, it's not clear to me what UAs do here. For example, if a navigation matures while the postmessage task is queued (probably testable by doing postMessage from unload at least in the same-process case) does the message event fire on the old Window, the new Window, or neither? It looks to me like the spec requires it to fire on the old Window; not sure whether that firing would be observable (and hence whether this is black-box distinguishable from "neither").

@annevk @domenic

domenic commented 6 years ago

I find the way this issue is phrased to be confusing, as a window's Realm is immutable, so we could just move step 2 to be before step 7.4 and get the exact same behavior without tripping up any "synchronous access to other processes' objects" alarms.

Is there a deeper issue here?

bzbarsky commented 6 years ago

Well, the sync access to Window is a problem too, fundamentally... What you have available sync is a WindowProxy; that's really it.

bzbarsky commented 6 years ago

Or more precisely... There may be a Window available sync (to make the WebIDL branding checks work, though that stuff is actually pretty underdefined/broken with WindowProxy), but it's not the same Window as the place where the event will get dispatched; it just forwards things along to the right place. And it either does not have a Realm or has a different one...

domenic commented 6 years ago

I guess both step 1 and step 2 could both move into the queued task. (I guess, modified so that we save a reference to the WindowProxy outside the queued task, and then get its inner Window inside the queued task.)

But it sounds like you are hinting at a more general problem where every one of the CrossOriginProperties need to have their algorithms rewritten to be more obvious that they're not going to cause problems across process boundaries. I'm not sure how we'd do that exactly.

bzbarsky commented 6 years ago

Yeah, I'm not quite sure what the right way to do it is, either. Not least because I've just started seriously thinking about it on the implementation side. It's worth talking to engineers from browsers which have cross-process things in the same unit of related browsing contexts (which Gecko does not yet) and seeing how they view this stuff.

annevk commented 6 years ago

To really solve this there are many things that would need to be changed. We'd have to actually allocate an event loop per agent properly, allocate multiple WindowProxy objects, define some kind of "proxy WindowProxy" variant for the WindowProxy objects that are in a different agent from their corresponding global, etc.

annevk commented 6 years ago

(I mentioned that elsewhere quite a while ago and also mentioned that to the Google folks who didn't think cross-process <iframe> warranted any specification changes, but not much interest in addressing it ever materialized.)

gterzian commented 5 years ago

Well, the sync access to Window is a problem too, fundamentally... What you have available sync is a WindowProxy; that's really it.

If I understand it correctly, some "data", like the Realm of the target window, is immutable and hence could be "easily" shared across process.

And "queuing a task" on the event-loop corresponding to the similar-origin window agent of the target window, in theory doesn't require having sync access, since queuing a task is an async operation?

It looks to me like the spec requires it to fire on the old Window; not sure whether that firing would be observable (and hence whether this is black-box distinguishable from "neither")

If the document of the "old window" is not fully active due to a navigation that occurred after the message was posted but before the task was queued, the task could still execute later if the document where to become fully active again(which would make the task runnable).

So I would say the spec could indeed take into account that "queuing a task" across process could involve some delay, however I think the spec in general doesn't really depend on the exact timing of when a "enqueued task" is actually processed by the event-loop when the enqueuing happens from parallel steps(otherwise any parallel step enqueuing a task would have to first "stop the event-loop", enqueue the task, and then let it continue).

From that perspective, the task enqueued via cross-origin messaging is not that different from say, the task that fetch enqueues to an event-loop from parallel steps.

It's worth talking to engineers from browsers which have cross-process things in the same unit of related browsing contexts (which Gecko does not yet) and seeing how they view this stuff.

In Servo there is a DissimilarOriginWindow, which owns an ipc-channel to a central component(the constellation), which can re-route the message to the appropriate event-loop(via another ipc-channel).

bzbarsky commented 5 years ago

I really wish I could link to snapshots of the spec, because the spec here has definitely changed since I filed the bug, and now I have to try to recover what the problems were, and whether they're still present...

If I understand it correctly, some "data", like the Realm of the target window, is immutable

The Realm identity is immutable, but the Realm itself contains and points to mutable state, right? So you can't share the Realm itself across anything, just some sort of identifier.

since queuing a task is an async operation

That's an interesting question. Tasks queued to the same task queue are required, by current spec, to run in the order they are queued. That means that queueing must be at least order-preserving in various ways, if not outright sync. Running a queued task is an async operation, of course.

It's possible that there is not an actual problem here, by the way. It's just not obvious, and there's a great deal of leeway for implementors to do things differently from each other when the spec conceptual model doesn't match what's really going on....

annevk commented 5 years ago

From browsing https://html.spec.whatwg.org/commit-snapshots/ the snapshot you want (for May 14, there's none for May 15) is https://html.spec.whatwg.org/commit-snapshots/90a60b2a0dc740b8b0093b07ca0a41e70ba8d83a/ I think.

You might also enjoy https://github.com/whatwg/html/blob/master/README.md#blame to find the commit hash you're looking for if you don't know the date (although git log --grep=... is often quicker if you know a couple of things).

gterzian commented 5 years ago

The Realm identity is immutable, but the Realm itself contains and points to mutable state, right? So you can't share the Realm itself across anything, just some sort of identifier.

Agree. I'm actually not very familiar with the "Realm" concept yet, but the spec seems to avoid "manipulating it's state", and seems to use it in this algorithm as a kind of identifier(targetRealm used as an argument to StructuredDeserializeWithTransfer).

Tasks queued to the same task queue are required, by current spec, to run in the order they are queued.

I agree that from a given set of parallel or event-loop steps, the ordering is assumed to be preserved, for example the various tasks enqueued to convey the lifetime of a fetch response, should indeed arrive at an event-loop in the order they were queued by a specific parallel fetch.

Ordering should also be preserved if a task on the event-loop ends up queuing several tasks.

I also think that from the principle of serialisability-of-script-execution, an event-loop doesn't have a concept of "when a task was enqueued by parallel steps(or another event-loop)", only that for a given set of parallel steps, any tasks enqueued should retain their ordering.

I might be biased in the light of Servo's implementation, and I think this fits nicely with the concept of an unbounded "multiple-producer-single-consumer" channel, with non-blocking send operations, where ordering of messages is preserved per producer, but not across producers. With the consumer, an event-loop, not being concerned with synchronizing between producers, but still wanting to see messages per producer in the order in which the producer sent them.

I've tried to write things down with regards to task-sources and queues and how they might be defined more precisely, although I can't say it's very actionable: https://github.com/whatwg/html/issues/4615

For example if we were to define the DOM manipulation task-queue as local to a specific event-loop, one could indeed reason about an absolute ordering of all tasks, since they're all enqueued from sequential steps running on the event-loop.

But for example for the networking task-queue, which is used (I think) almost exlusively to enqueue tasks back on an event-loop from parallel fetch steps, I don't think one can reason about the relative ordering of tasks between different parallel fetches, and it also doesn't matter since it's really just about having consistent ordering of tasks per parallel fetch, and the event-loop doesn't care about the order of tasks across independent fetches.

If we needed one such "shared/ordered parallel queue" with consistent ordering of queuing across parallel algorithms, I think it would require a special definition, probably involving a blocking "enqueue" operation where the parallel step(s) would block until the task had been dequeued by the consumer.

(Compare the current parallel-queue, which also seems to imply an undounded mpsc type of queue, and incidentally is only used for the shared-worker-manager, of which there is only one per user-agent, and there doesn't seem to be a form of synchronization assumed between various event-loops enqueuing steps on the queue, although ordering is probably assumed to be preserved from the point of view of one event-loop enqueuing several algorithms from the same "thread", say through repeated calls to SharedWorker(scriptURL, options)).

gterzian commented 5 years ago

Finally, on a more concrete note, we could perhaps define a cross-origin window not as an actual window with a postMessage method, but rather as some sort of wrapper containing enough immutable data to identify the corresponding "window", and with an implicit MessagePort object used for the underlying messaging, as suggested by @KiChjang

Alternatively, the postMessage implementation on Window is ad-hoc and the spec in fact defines that the entire mechanism should really be managed through a MessagePort object as a field on Window. When we do get to the point where we can use MessagePorts, perhaps this issue would become obsolete. https://github.com/servo/servo/issues/22512#issuecomment-500100215

This approach would be I think similar to workers, which

Worker objects act as if they had an implicit MessagePort associated with them. This port is part of a channel that is set up when the worker is created, but it is not exposed. This object must never be garbage collected before the Worker object.

The postMessage(message, transfer) and postMessage(message, options) methods on Worker objects act as if, when invoked, they immediately invoked the respective postMessage(message, transfer) and postMessage(message, options) on the port, with the same arguments, and returned the same return value.

(For dedicated workers, see https://html.spec.whatwg.org/multipage/workers.html#dedicated-workers-and-the-worker-interface:messageport)

Perhaps an implicit MessagePort on cross-origin window(wrappers) would be a more consistent way of clarifying the spec @bzbarsky ?

bzbarsky commented 5 years ago

What would be the actual behavior of window.postMessage.call(crossOriginWindow) in that setup and why?

gterzian commented 5 years ago

What would be the actual behavior of window.postMessage.call(crossOriginWindow) in that setup and why?

We could do something along the lines of:

As part of creating-a-new-browsing-context, we create two ports, one associated with the browsing-context, and the other with the window, and entangle them.
When a document is unloaded, we close the port of the window(making whether messages sent after that are to be received or not an explicit "no").
When a window is created as part of initialise-the-document-object, we disentangle the browsing contexts port from it's "old" window, and create a new port for the new window, entangling it with the one of the browsing context.
When a document is loaded, we enable the port of the window.
We further specify that the postMessage on a window(proxy?) should be treated as calling the same method on the port of the window's browsing context.
We also specify that an implicit onmessage handler on a window's port, hidden from script, should re-target all events to the window itself.

I think at this point a browsing context should always have a port, and if it has a window associated with it whose document is fully-active, that port should be entangled with the window's port(by then enabled). In theory you should then have a messaging mechanism that works for both same- and cross-origin use cases.

Then window.postMessage.call(crossOriginWindow) would mean "call postMessage on the port of the browsing context corresponding to crossOriginWindow, which will then enqueue a message on the " port message queue" corresponding to the port of the actual window running on a different origin".

You could still lose messages due to a navigation, and that would be treated with explicitly as part of document unloading.

gterzian commented 4 years ago

Ok so I think there might be a few actionable items here.

First of all, we can move Step 8.2 "Let origin be the serialization of incumbentSettings's origin." to a step prior to queuing the task(happening at Step 8).

I think that's important, since the "origin" of the incumbent setting object from where postMessage is called could change between queuing the task, and running the task(the step 8 sub-steps).

Currently, the spec effectively says to take the origin of the incumbent, when the task runs. That seems like an opportunity for some sort of cross-site attack, since you could queue the task, and then navigate to another page, and then the queued task would masquerade as coming from the origin that would be the result of the navigation.

I'm pretty sure UA must serialize the origin prior to step 8, and then "enclose" it within the queued task to use it at step 8.2. So we can move step 8.2 to before step 8 to make that explicit(this is similar to https://github.com/whatwg/html/issues/1371).

On the other question of the "realm"(it's a bit hard to remember what is exactly referred to above, the step numbering changed).

I think one can queue a task to a specific realm, even across process, since that only requires having a reference to something allowing you to queue a task, not the actual realm.

In the algorithm, the realm is also only used at step 8.4(inside the queued task), to do StructuredDeserializeWithTransfer(serializeWithTransferResult, targetRealm).

So in theory step 1 doesn't require an actual handle to the realm, rather some sort of identifier, in a similar vein to targetWindow, that is sufficiently precise to know on which event-loop to queue the task, and to later use from within the task. When the task runs, you presumable have access to the target realm, since the task would be running in that context.

In terms of the target window navigating away while the task has been queued, I think from the point of unload the task has no chance of running anymore(unless one would argue that the new document would have replaced the old one in the same BC, and therefore should receive the message). If that is too late, we could explicitly cancel all tasks on the "posted message queue" as part of aborting the document, like is currently done with tasks related to fetch(see step 2 of https://html.spec.whatwg.org/multipage/#abort-a-document).

So I don't think the spec requires cross-process synchronous access, since the variables use could in practice be replaced with identifiers as opposed to actual references to objects, and later when the variable is used to actually perform an action on an object, it seems to be from a context from which that object would indeed be readily available.

(The use of "Let source be the WindowProxy object corresponding to incumbentSettings's global object (a Window object)." at step 8.3, so within the task, could also be seen as using an identifier passed along in the task, to then locally instantiate some sort of proxy, perhaps a cross-process one, to that window proxy. That is indeed a bit hairy and I think discussed elsewhere as part of the "proxy to a windowproxy" discussion, see for example https://github.com/whatwg/html/issues/3727#issuecomment-393442625)

domenic commented 4 years ago

That seems like an opportunity for some sort of cross-site attack, since you could queue the task, and then navigate to another page, and then the queued task would masquerade as coming from the origin that would be the result of the navigation.

I don't believe it's possible for environment settings object's origins to change. Certainly not as a result of navigation.

bzbarsky commented 4 years ago

I was about to say the same: the origin of a settings object can't change.

I think one can queue a task to a specific realm, even across process

Sure, if you have an identifier for it. But when you are calling postMessage, what realm is it you are targeting, exactly, and how do you figure that out, if you want to talk about targeting realms? The answer is that it's not really well-defined in the no-sync-access case, as far as I can tell.

If the intent is to "deserialize into the Realm that was in the browsing context at the point in time when the postMessage call happened", then you have a sync-access problem no matter what, because the "which identifier is in there" update isn't sync either, right? (Shared-memory tricks aside, and if the specification wants to require those, we need to be very explicit about it.)

I should note that if we presuppose that targetWindow is actually a Window, as the spec says, we don't need the targetRealm thing at all; we can get it from targetWindow at any point. But determining the targetWindow is itself an issue. (And yes, you can do the identifier thing, but fundamentally that doesn't quite seem to solve the problem this is trying to address by grabbing the reference at message send time, not message reception time... or rather it has failure modes in the opposite direction, where we could end up targeting a "stale" window because the new identifier hasn't propagated to us yet, as opposed to targeting a "new instead of old" window if we just determine the window at the point when we plan to fire the event.)

I think from the point of unload the task has no chance of running anymore

Why not? Nothing I see in the spec precludes it from doing so.

The fundamental question we need to ask ourselves is why the target selection is the way it is and whether the goals of that can be accomplished sanely in a multiprocess world. Because an alternative behavior would be to immediately jump to talking about browsing contexts and posting a task to a browsing context, then selecting whatever Window is current when the task runs. But it's not clear that the spec has a good concept of tasks attached to browsing contexts, not Window instances.

gterzian commented 4 years ago

I was about to say the same: the origin of a settings object can't change.

Thanks for pointing that out. Can the settings object on the other hand go away, if the window does? It could still make sense to do the origin serialization before queuing the task, and then use the serialized origin inside it, instead of requiring access to the settings object of the sender from within the task.

I should note that if we presuppose that targetWindow is actually a Window, as the spec says, we don't need the targetRealm thing at all; we can get it from targetWindow at any point. But determining the targetWindow is itself an issue. (And yes, you can do the identifier thing, but fundamentally that doesn't quite seem to solve the problem this is trying to address by grabbing the reference at message send time, not message reception time... or rather it has failure modes in the opposite direction, where we could end up targeting a "stale" window because the new identifier hasn't propagated to us yet, as opposed to targeting a "new instead of old" window if we just determine the window at the point when we plan to fire the event.)

Ah ok I see what you mean. So actually what is happening in Servo, is that the identifier used to route the message, is a double key of the identifiers for a BC and a document.

And the "cross agent-cluster proxy to a window" is instantiated for a specific BC/document combo, it's not something that is synced with the currently active document of that BC.

For example in Servo we check if the document/BC combo can receive the message twice:

At the constellation, which routes messages between various agent-clusters, see https://github.com/servo/servo/blob/c78f0d74490d310f6342f7c7af74cde4c5cfbcc8/components/constellation/constellation.rs#L3941
At the point of receipt in the agent-cluster, see https://github.com/servo/servo/blob/c78f0d74490d310f6342f7c7af74cde4c5cfbcc8/components/script/script_thread.rs#L2490

The spec doesn't seem to require giving feedback to the caller of postMessage as to whether the task was successfully queued, or run.

Why not? Nothing I see in the spec precludes it from doing so.

You're right, unloading in itself doesn't clear the task-queues. It could be specified to clear out the "posted message queue", or that could be done earlier when the document is aborted? When a navigation response is actually handled, then the tasks for the old document cannot run anymore, right? I'm referring to https://html.spec.whatwg.org/multipage/#navigate-html

bzbarsky commented 4 years ago

then the caller is either in the same event-loop and has sync access to the window, or the caller is in another (window)event-loop, which by definition means another agent-cluster

At what point in time? When the postMessage call is happening? At that point in time, one of those two things is true, yes.

Dedicated workers can post messages back to their window, but we're specifically worrying about Window's postMessage here.

However, I assume the "proxy" to which you have access has been instantiated for a specific window

Nope. The "proxy" has lifetime spanning multiple windows. It does have something that can identify the current window it's proxying, obviously.

It could also contain an identifier for a specific document in that window

Apart from initial about:blank, there is only one document per window, and documents never really enter into this picture: all the work here happens on windows, without reference to documents.

Then whether that window is still running

Define "still running"?

or the document still in the session history

Note that documents can be in the session history (e.g. in the non-discarded but navigated-away-from) state and not subject to receiving messages (because navigated away from).

The spec doesn't seem to require giving feedback to the caller of postMessage as to whether the task was successfully queued, or run.

That's correct.

but the message is for a given document

That's presumably buggy with initial about:blank?

And the "cross agent-cluster proxy to a window" is instantiated for a specific BC/document combo

Conceptually, in the spec, there is only one WindowProxy per BC.

When a navigation response is actually handled, then the tasks for the old document cannot run anymore, right?

That is an interesting question, with the spec and implementations being all over the place on the details....

gterzian commented 4 years ago

Ok so actually what I wrote is incorrect, in Servo, the "cross agent-cluster proxy to a windowproxy" is linked to a specific browsing context.

Then, in the "constellation", which is like a unique router in the UA, that state of the "currently active document in the session history of that BC" is stored. So when the "proxy to a proxy" does a postMessage, it goes through the constellation, which then routes it to the actual document that is the "currently active one in the session history" for that BC.

So basically the only "identifier" stored in the "proxy to a windowproxy", is that of a BC, then the routing to a given window/document is done based on the state of the session history for that BC, at the point of routing, not at the point of sending.

Then when a message is received in the target agent-cluster, we check again in case the window would have been closed already(which can happen due to messages ahead in the queue versus the one routing the postMessage call).

Note that the constellation manages navigation and session history, so basically the outcome of the question "to which window is this message going to get routed" is made there, as is any navigation/changing of the session history.

So essentially the "constellation" in Servo acts a a parallel queue with regards to those workflows, which allows for synchronization across agent-clusters(not without some hairiness I admit).

As you might have noticed already, the parallel-queue is my favorite, and in my opinion most underappreciated (since only used for shared-worker managers and joint session-history), concept in the spec.

So I think if we spec instantiating "proxy to a window proxy" precisely, and link those to an actual window proxy in another agent-cluster via an identifier, and then we would need to serialize the routing of postMessage with changes to the session history of that window proxy via a single parallel-queue, to formalize to which "window" the message would end-up being routed to.

bzbarsky commented 4 years ago

then the routing to a given window/document is done based on the state of the session history for that BC, at the point of routing, not at the point of sending

OK, but does that match what is currently specced? If it doesn't determine the target Window at the time of sending, I don't think it does. Which is precisely what this issue is about.

gterzian commented 4 years ago

Yes I think you're right, so the spec says "target window", in practice in Servo at least, it's rather "target window proxy(via a proxy to it)".

But by the way does "window" not usually means "window proxy"?

gterzian commented 4 years ago

By the way the spec does have an interesting note:

When posting a message to a Window of a browsing context that has just been navigated to a new Document is likely to result in the message not receiving its intended recipient: the scripts in the target browsing context have to have had time to set up listeners for the messages. Thus, for instance, in situations where a message is to be sent to the Window of newly created child iframe, authors are advised to have the child Document post a message to their parent announcing their readiness to receive messages, and for the parent to wait for this message before beginning posting messages. (https://html.spec.whatwg.org/multipage/web-messaging.html#posting-messages, the "note")

annevk commented 4 years ago

(I think it really depends on who you talk to what "window" (lowercase) means. Window typically refers to the global object though and not the global this object (i.e., WindowProxy).)

whatwg / html

postMessage requires synchronous possibly-cross-process access #3691