Closed isonmad closed 10 months ago
It doesn't really work with the MessageChannel model as discussed in that email. Do you have a specific proposal that doesn't expose GC?
Reading over the thread and the spec again, that particular email's claim that there's ambiguity about what a port's 'owner' is is actually wrong. The spec does define what settings object owns the port, and sharing a reference doesn't change it. The pass-port-around case is already basically broken, it looks like.
What about an event that fires:
I'll dupe https://www.w3.org/Bugs/Public/show_bug.cgi?id=28813 against this.
@isonmad the problem with that is that rather than exposing GC, you end up creating a memory leak by keeping the objects alive for the duration of the global. I think service workers ended up doing this without much thought about it.
That was in the thread too, and it turned out there was already a leak of MessagePorts, at least in Chrome in 2013, and the spec language was written around that fact that devs should be careful regardless and not depend on GC.
But is there a technical reason the MessagePort object itself actually has to be kept alive in its entirety in implementations of onclose
? It seems like there just needs to be an event in Context A, where the onclose
handler was set, that fires whenever the other end of the port, Context B, goes away. Which context 'Context B' refers to can change within the lifetime of the MessagePort as it's passed around, but at the point where the MessagePort in B has no more references to it, and the leak might occur, can't it just be GC'd and replaced with a simpler, equivalent internal event listener where A just subscribes to and listens for B's demise? There must be some centralized tracking of processes/contexts, right.
I'd like to see this revived now that WeakRef
is available, and in light of similar APIs which do offer the 'close' event. It seems like this event would be the only way to know when the other end of the channel goes away—especially for unexpected failures such as crashes or OOMs.
(My interest specifically comes from using MessagePorts in Electron as a general-purpose IPC tool)
I use LockManager to shim web-worker onclose
.
/// worker
const lockReqId = 'process-live-'+Date.now()+Math.random();
navigator.locks.request(lockReqId,()=>new Promise(()=>{}));
postMessage(lockReqId);
/// master
worker.addEventListener('message', me=>{
if(typeof me.data==='string' && me.data.startsWith("process-live-")){
navigator.locks.request(me.data,()=>{
worker.dispatchEvent(new CloseEvent('close'))
})
}
});
I use LockManager to shim web-worker onclose.
I'd like to revisit this; in Firefox we're implementing Web Locks and it seems like absent a MessagePort close
event or a newly introduced disconnect
event on SharedWorkerGlobalScope
, the above comment's WebLocks shim will end up becoming a necessity.
I too re-read the original mailing list thread from the start that was linked to in the top comment in this issue and it seemed like the primary discussion points were:
With Web Locks (which is moving to WebApps WG) exposing same-storage-key agent termination and ServiceWorkers exposing same-storage-key client enumeration via matchAll, there already exist same-origin mechanisms to surface lifecycle.
I think the main risk with the "close" event proposal here is that it could potentially expose cross-origin life-cycles in an adversarial manner if not constrained. Specifically, if WindowProxy.postMessage(..., "*", ...)
is used to transfer a MessagePort into a different origin, would this allow the sender of the message to monitor when that agent terminates with no opt-in by the recipient agent? Presumably this could be addressed by requiring the (cross-origin) recipient to take some affirmative action indicating it wants to use the MessagePort by adding an event listener on it or invoking postMessage on it or something like that.
@asutherland I don't think that would expose recipient agent lifetime without explicit action on part of the recipient. The sender would not be able to distinguish between "the other end of the message channel was dropped due to GC" and "the recipient agent was terminated". If the recipient agent does not register any event listeners on the port it receives, and does not save a reference to it, then the port will be garbage collected and the close event would be emitted on the sender side, revealing no information about the lifecycle of the recipient.
@nornagon I agree that if we generate the close event on GC then we side-step the problem. I was unclear about which specific proposal I was discussing, which was the proposal comment in this thread at https://github.com/whatwg/html/issues/1766#issuecomment-247507345. I read that proposal as explicitly not letting it be known when GC happens but instead when the global is terminated, which means that the message port post message steps which explicitly deserializes regardless of whether there are any listeners will then run the transfer-receiving steps would inherently need to tie the MessagePort to the global in a way that goes out of its way to ignore GC. @annevk pointed out issues related to this in his comment at https://github.com/whatwg/html/issues/1766#issuecomment-266814601.
@domenic @annevk Do either of you have a feeling about what the current state of things is in terms of concerns about exposing GC? I know in Firefox that the existence of Web Locks and the (ServiceWorkers) Clients API means that it is conceptually possible to detect when Firefox GC's a Worker, for example.
I think there are 2 kind of collection we need to differentiate: an object within an agent, and an agent itself.
Like has been expressed in the email threads, I would be worried about exposing the collection of objects inside an agent (MessagePort) to another agent. Even with the existence of WeakRef and FinalizationRegistry, the capability to observe an object lifecycle requires a direct access to these APIs and to the object to observe. As such, tying the observable lifecycle of a remote MessagePort to program actions or to the remote agent's overall lifecycle seems more appropriate.
From what I understand, automatically tying the observable remote MessagePort lifecycle to the receiving agent creates its own problems. As @asutherland mentions, the solution is to tie the observation capability to explicit action of the program when receiving a MessagePort.
More specifically, a sending agent should not be informed of a remote MessagePort disconnection unless the program running in the receiving agent starts the port (registers a message listeners). At that point the sending agent would receive a close notification when either the receiving agent terminates, or when the program explicitly closes the received port.
That is different from what @nornagon suggests: a sending agent cannot be made aware of the disconnection of the MessagePort if the receiving program ignores the received port.
@domenic @annevk Do either of you have a feeling about what the current state of things is in terms of concerns about exposing GC?
I think since the introducing of WeakRefs the boundary has been broken a good deal, and anything which is equivalent to WeakRefs in power is allowable. It's still not something you want to do lightly, as minimizing the number of ways in which an application can be GC-dependent is important. But if it's useful (and I think this instance qualifies; we've been hearing this request for over a decade now) then it's probably reasonable to do.
It sounds like we have a similar precedent with Web Locks and ServiceWorker for whole-agent GC, so I would extend the same reasoning to that case.
For the "close" event itself, CloseEvent is already a thing and is defined in the web-sockets chapter of HTML. Would we just reuse that?
It has 3 attributes, defined like so quoting from here:
The defaults per CloseEventInit are false
, 0
, ""
. wasClean
seems like it could potentially be reused usefully here with appropriate semantics, but the others presumably would be left as their defaults?
I don't think we should use CloseEvent for this, as it seems rather WebSocket-specific. We can just use Event.
@domenic I'd also like to add my two cents here. For a client, I am building a micro-frontend application where users can dynamically add and remove additional apps hosted in iframes loaded from various origins. We're connecting to those third-party browsing contexts via channel messaging, which works perfectly fine.
However, we must do some cleanup work after an iframe is removed, when the site was unloaded, has crashed, etc. Unfortunately, due to the lack of an onclose
event, we currently have to poll all connected client apps to determine whether they are still connected. This heartbeat mechanism is quite inefficient though, and the detection of a closed client can be significantly delayed.
For our use case, it would also be okay to opt-in to exposing lifecycle events to third-party browsing contexts (as @asutherland mentioned) as the client apps are under our control. For us, it would also be fine if the event would be dispatched sometime after the message port has actually been closed.
All in all, it would be really cool to have this event. :)
When we discussed this at TPAC in 2022 others did share the concern around running JS on shutdown so it might still be better to only do this at deterministic times for now. I.e., when the specification dictates shutdown.
When we discussed this at TPAC in 2022 others did share the concern around running JS on shutdown so it might still be better to only do this at deterministic times for now. I.e., when the specification dictates shutdown.
@annevk Can you clarify what you mean here? I understand the central decision here is about when to notify a close has occurred for an opted-in MesagePort (by starting it after it's been deserialized in the given global) for situations other than the obvious invocation of close. Those times are:
Specifically, which of the two above is compatible with the TPAC 2022 discussion? If the answer is none, how do the concerns deal with what Web Locks has already enabled? Thanks!
I think it indeed tied back to @isonmad's original proposal of embracing the memory leak in order to not expose GC. I also realize now that the script wouldn't run in the agent being shutdown as presumably we'd only dispatch the close
event on the other side, which helps. (Though what if both sides go down?)
I'm not sure what the model with Web Locks is and whether that can still change. It does seem like that would indeed end up exposing similar things.
(The discussion at TPAC was quite brief and I somewhat regret bringing it up now, but on the other hand I learned a few things due to your reply...)
...the other side, which helps. (Though what if both sides go down?)
I'd say in that particular case, we could probably delegate the handling to the service tailored exactly for such cases - NOOP
, since taking two contexts, out of total two (max) two, there is a noticeable lack of, well, anything.
Jokes aside, if it's the agent (subscriber, consumer, context w/e) is the one shutting down, handling of any sort as a concern falls to the agent.
I've written an explainer that covers the problem, the background, a proposal and a discussion of alternatives.
TL;DR the proposal is to send a close
event as soon as the ports become disentangled (explicit close()
, crash, destruction of JS context, unreferenced port gets GCed). This makes no new information available vs status quo but it does make the information available in a more timely manner (less dependency on GC). The justification for the choice vs alternatives is in the doc. If there's a reason why that's problematic or if you think I've missed some important part of the discussion, please let me know. Inparticular @asutherland you raised the issue of exposing navigation. This is already exposed but this would make it timely. Did you have a specific attack in mind or was it the usual avoidance of unnecessary cross-origin exposure?
The explainer proposal is great, thank you. I'm on board with implementing this in Firefox.
Re: exposing navigation, the below is just me restating why this proposal works in relation to things I've previously said; I'm not raising any new issues:
My concern was a more abstract one of creating non-trivial new side-channels, but that was only a concern if we were intentionally tying the MessagePort to the global in an intentional memory leak to avoid exposing GC in a way that would let navigation be detected in the future without an opt-in from the recipient global. In particular, with that capability, being able to "ship" the MessagePort outside the agent cluster would be a new capability beyond what having access to the WindowProxy would mean (or other similar browsing context hierarchy shenanigans).
This proposal does not create a non-trivial new side-channel because an ignored MessagePort postMessaged to a WindowProxy that doesn't care will be subject to GC. As the proposal explains in great detail (and consistent with @domenic's comment, this potential exposure of GC timing is fine because WeakRefs already expose GC. As the explainer also points out, there is a trivial new side-channel where, because the MessagePort will currently become entagled with the global[1], a close event could be delivered as a result of a very rapid navigation or something like that, but that's a very short timeline and we would expect the WindowProxy to still be available on that timeline, etc.
1: Per spec, at least, the deserialization in step 7.3 of the message port post message steps is unconditional. And I believe in Firefox we unconditionally do this, but I think I remember seeing a bug filed at one point that suggested that other engines may lazily perform the deserialization so that it only happens on demand.
Edit: Skip this comment, I was confused about when "source" was set, but @rakina's comment below cleared things up for me and you can probably skip my comment explaining my confusion as well.
A related change we can make if introducing the close event that came up while talking with @smaug---- was that 9.4.5 Ports and garbage collection normatively says:
When a MessagePort object o is entangled, user agents must either act as if o's entangled MessagePort object has a strong reference to o, or as if o's relevant global object has a strong reference to o.
Without having done the historical investigation, my impression of this text is that it was an effort to prevent GC/CC being observable. But given the changes here, it would seem to make sense to either remove this text or to alter it to make it clear that a strong reference should be conditioned on both 1) being entangled and 2) having "message" or "messageerror" listeners which could obtain a new strong reference to the MessagePort via the source
property on the events. (Note that the explainer does explicit cover GC but doesn't mention relaxing this limitation on GC that I'm aware of.)
Note that while the streams spec uses MessagePorts for transferred streams, its hookup explicitly adds the above event listeners and it explicit disentangles the ports without removing the event listeners, so the above proposed change shouldn't affect streams, just manually created MessageChannels/MessagePorts.
I would love to see this implemented. My case is that I have shared web worker which may have more than one window connecting to it.
Because there is no close event in the case one of the windows goes away, I have to use combination of WeakRef (on the worker side) and polling alongside with explicit window close events in the cases where I can get those on the window side, so that I can try to guess when window is no longer available.
To combat worker going under, I have added missing polling timeout on the window side (in the case timeout is fired I try to poll worker one last time before giving up). I have also added explicit worker close event to worker in the case an event happens to be processed during worker close, ie closing flag is true (althought I'm not sure in what situation that can happen outside of last connection dropping). With these combined I should be able to know if window needs to reconnect to shared web worker.
All in all this is quite much code, just to see if the connection is still alive. Also, because of polling, there is quite much extra processing involved (I have to poll quite often to ensure that I can determine missing window in a reasonable timeframe).
In the case I'm implementing I'm using shared web worker as a communication platform between the windows. It is used as a secure side channel alongside with window to window communication, as the window to window communication is not secure by design (at least in my case as I'm routing most of the messages through thirdparties). Because the shared web worker knows all the connecting windows it is imperative that it can know what windows are still available to prevent unnecessary wait time (for the user) and/or unwanted behaviour (in the case where we think the window had gone under but it was just bogged) in the cases where there is multiples windows affected.
FYI, Chromium has started implementing the proposal from the explainer. Please let us know if there are still unresolved questions that haven't been answered in the explainer etc. We'll be sending out a spec PR soon too, but I have a question about @asutherland's comment above about GCs:
A related change we can make if introducing the close event that came up while talking with @smaug---- was that 9.4.5 Ports and garbage collection normatively says:
When a MessagePort object o is entangled, user agents must either act as if o's entangled MessagePort object has a strong reference to o, or as if o's relevant global object has a strong reference to o.
Without having done the historical investigation, my impression of this text is that it was an effort to prevent GC/CC being observable. But given the changes here, it would seem to make sense to either remove this text or to alter it to make it clear that a strong reference should be conditioned on both 1) being entangled and 2) having "message" or "messageerror" listeners which could obtain a new strong reference to the MessagePort via the source property on the events. (Note that the explainer does explicit cover GC but doesn't mention relaxing this limitation on GC that I'm aware of.)
Can you elaborate more on why we would need this part to change? From the note in the spec, this seems to be there to allow the message event listener for the port to continue to run even when there's no references to the port itself. Is the intent behind changing that to allow the port to be GCed more often, or something else? How does that relate to the close
event?
Also, even if we need to change it, I don't think point #2 is relevant, since the source property isn't actually set on message
or messageerror
events. From the spec, only connect
events on shared workers has the property set for MessagePorts. What would be a reasonable alternative if we need to change the text?
(Thanks!)
Can you elaborate more on why we would need this part to change? From the note in the spec, this seems to be there to allow the message event listener for the port to continue to run even when there's no references to the port itself. Is the intent behind changing that to allow the port to be GCed more often, or something else? How does that relate to the
close
event?Also, even if we need to change it, I don't think point #2 is relevant, since the source property isn't actually set on
message
ormessageerror
events. From the spec, onlyconnect
events on shared workers has the property set for MessagePorts. What would be a reasonable alternative if we need to change the text?
Aha, I definitely had gotten confused and thought in the general MessagePort case that we would set the source. Thank you for correcting my misunderstanding! I believe this was a combination of 1) the similar but unrelated ServiceWorker.postMessage behavior, and 2) some extra flexibility in Gecko plumbing code that supports passing the MessagePort as a source, but we definitely do not pass it in this case.
And then I think this clears up my confusion about 9.4.5 Ports and Garbage Collection that you link. The bit about how if we have a global gA and a global gB having entangled ports pA and pB in gA and gB respectively, it's saying either...
...makes more sense to me now as I don't need to worry about the potential to create new strong references, so GCing a port and thereby automatically closing it is much more straightforward.
I'll edit my comment above to indicate it should be skipped to save at least future me headaches :)
Thanks again!
Unfortunately Chrome's plan to implement this has been blocked by our security review. Their response is available here. Quoted below
About the feature itself, this reveals the GC activity of the process holding the other end of the pipe. It does so, even if this document isn't willing to process incoming messages from unknown origins. Here is an example: https://message-port-close-learning.glitch.me/
GC events might correlate with various events. For instance, it will leak navigations. Those could be used to learn what are the users connected to some websites.
We could ship this feature if there was some opt-in from the cross-origin website before sending the "close" event. You can potentially consider one of:
- Requiring the MessagePort to be "used" before sending a close event. That is to say, using one of: MessagePort.start(), MessagePort.postMessage(), or registering a MessagePort.onmessage / MessagePort.onmessageerror handler.
- A new dedicated API to register a close event "beacon".
Unfortunately the argument that this information is already leaked was not persuasive.
There's a demo here
(source).
On Chrome and Safari this demo shows that the info is already leaked.
On FF it seems that GC of MessagePorts
was avoided to prevent this information leaking.
I would rather not make sending the close event conditional on some kind of opt-in because it leaves a bunch of cases un-covered but it might be the best we can do.
A challenge with an opt-in is that a frame may navigate or be unloaded before it has had a chance to call the API on the MessagePort
instance that enables sending of the close
event to the other end of the connection. In Hypothesis's case the MessagePort
pair is created in a parent frame, and the two ports are sent to different child frames which subsequently communicate.
Assuming the act of posting a message opts in, the frame that listens for the close
event could work around this by waiting for either an initial message or a timeout to expire, before it considers the MessagePort properly connected (and thus expected to send a close
event).
How about opt-in but same-origin is automatically opted in? Because ports can be infinitely re-shipped this might be most sanely modeled as a new flag "has been shipped cross-origin". The flag gets set when entangling based on the environment, so once a port goes across origins it's now opt-in even if it's subsequently re-shipped to its original origin.
@fergald @robertknight @asutherland can one of you please ensure this is tracked in a new issue? Also, if it's not resolved in some timely manner (say a couple of weeks) we should revert the specification change lest it accidentally gets implemented as-is.
I've filed #10201 Let's move the conversation over there.
A work-around, if you're able to re-jigger your use-case to use Streams instead of MessagePort
let { readable, writable } = new TransformStream(); // fundamental stream pair
const canary_transform = new TransformStream(); // hack to detect when fundamental stream pair is closed -- workaround https://github.com/whatwg/html/issues/1766
const canary = readable.pipeTo(canary_transform.writable);
readable = canary_transform.readable;
TARGET.postMessage({ replyStream: writable }, { transfer: [writable] });
canary.finally(CLEANUP_HOOK);
USE_STREAM(readable);
@James-E-A I guess the intention here is that canary's finally
will be called when TARGET
is closed but I had a quick go at that and I couldn't get it to work. Do you have a working demo somewhere?
@fergald Sure, here's a quick demo: https://webpages.uah.edu/~je0029/tmp/demo_stream_canary/
The stream-pair workaround as described above is found on line 33 of helper.html; it prints “Stream cleanup handler called.” to the debug console in a finally handler on the canary transform that's injected before transferring out the stream pair.
I found this to trigger (at least on Firefox) regardless of whether the stream was ended from the writable end (my test transfers that to its parent frame, and I confirmed it works even when that parent frame is on a different origin, such as http://localhost
) or from the readable end (my test transfers thatto a serviceworker).
Actually, that's my mistake. Further testing shows that this does not actually work to detect a transfer target being ungracefully killed; it only works if the target is able to either close()
or abort()
/error()
the stream. (However, using an onunload hook to ensure that the stream's killed seems to work reasonably well, if TARGET
is co-operative.)
But that requires that the unload event is fired which may not be always the case...
There's currently no reliable way to detect when a MessagePort becomes disentangled, whether due to purposefully calling .close(), or the tab being unexpectedly killed.
An onclose event was proposed and discussed several years ago, but it looks like the discussion petered out after that.
Any hope of reviving it?