Add event "browsingContext.crashed" to indicate a crash of the web content process

whimboo commented 5 months ago

It would be good to inform the client about crashes of web content processes. In CDP there is the following event:

https://chromedevtools.github.io/devtools-protocol/tot/Target/#event-targetCrashed

The event should contain the usual BrowsingContextInfo field as payload.

Note: I don't think that we can / have to do the same for browser crashes given that we most likely won't be able to send out this event.

gsnedders commented 1 month ago

css-meeting-bot commented 1 month ago

The Browser Testing and Tools Working Group just discussed Add event "browsingContext.crashed" to indicate a crash of the web content process.

The full IRC log of that discussion

<jgraham> Topic: Add event "browsingContext.crashed" to indicate a crash of the web content process
<jgraham> github: https://github.com/w3c/webdriver-bidi/issues/723
<simonstewart> scribe+
<simonstewart> jgraham: this could have been uncontroversial.
<simonstewart> jgraham: in some browsers, it is possible for a content process to crash and that's a recoverable error from the browser's perspective. From some testing scenarios it's helpful to know that this has happened, so you can stop targeting commands at the handle.
<simonstewart> jgraham: the proposal is that for those browsers where this applies, we fire an event
<simonstewart> jgraham: it might even be a flag on "destroyed"
<jgraham> q+
<jgraham> q-
<simonstewart> gsnedders: we should go further than this, and let browsers report crashes of other processes (eg. the network process) because we still want to know about it
<simonstewart> gsnedders: with most bidi implementations, if the UI process dies we're done for, but if other processes die, we want to be able to pass that back to the local end, especially in a testing situation.
<sadym> q+
<jgraham> q+
<simonstewart> gsnedders: this is because it turns out that tests can do something else if it knows what's no longer working
<simonstewart> gsnedders: I care about this from a browser testing point of view, but perhaps web testers don't care so much
<simonstewart> ack next
<simonstewart> sadym: when you say "other processes" what does that mean in terms of webdriver bidi?
<simonstewart> gsnedders: gives confused look
<simonstewart> sadym: if the http process fails, from the bidi perspective, what do you expect to be reported? The browsing context? Something else?
<simonstewart> gsnedders: I'd not be surprised if that differed between browsers and OSs. If we took the example of a browser that took its output from OpenGL where everything is composited in the GPU, in that case it would be universal for the whole thing, whereas, if you were using Metal, you could have one GPU process per renderer.
<simonstewart> gsnedders: we don't want to constrain particular browser architectures
<simonstewart> gsnedders: a crash that is inherently tied to a particular browsing context <something, something, something>
<simonstewart> gsnedders: may want to do something at the session level, or perhaps at the window level.
<jgraham> ack next
<sadym> q+
<simonstewart> jgraham: yes
<simonstewart> jgraham: I feel that there are two use cases here. Knowing a process has crashed is helpful to implementors, as that requires us to take some action. Knowing that some other recoverable crash has occurred is another one
<simonstewart> q+
<simonstewart> jgraham: I feel that for general crashes, we can't constrain the space of processes. eg. use an informative process name.
<simonstewart> jgraham: otoh, the concept of a render process feels pretty universal
<simonstewart> jgraham: so we could consider a something in the browser module, and perhaps something else in another module
<simonstewart> jgraham: <lists optional values we might return>
<simonstewart> jgraham: two events, basically. One for the browser, and one for the other ones
<simonstewart> ack next
<simonstewart> sadym: so it's either non-recoverable crash of a particular browsing context, or it's a debug message? Essentially it's not that different from other debug messages.
<simonstewart> jgraham: could go to the log module for that reason
<simonstewart> gsnedders: my gut reaction is that some things are more recoverable than others.
<simonstewart> gsnedders: if a Service Worker process crashes, it should never really be observable because the spec says that can always be restarted
<simonstewart> jgraham: if we put it in the log module, it would be natural to have a log level associated with it. eg. ERROR for non-recoverable errors.
<simonstewart> gsnedders: WARNING for things that aren't meant to be observable.
<simonstewart> jgraham: we don't have great log filtering in the spec
<gsnedders> s/gsnedders: WARNING/jgraham: WARNING/
<simonstewart> sadym: subscribing to log events can be quite expensive
<simonstewart> gsnedders: hence the filtering
<sadym> q+
<gsnedders> q?
<jgraham> q+
<jgraham> ack simonstewart
<simonstewart> simonstewart: if we could associate the browsing context with the log messages, we could capture the original intent of the crashing window, but we can also fulfill gsnedders goals too
<jgraham> ack next
<simonstewart> sadym: what we want to do is to subscribe to a subset of log events, so I know that I no longer need to wait for events from a page
<simonstewart> sadym: eg. a WPT test which crashes the page, but it never comes and we'd have to wait forever
<simonstewart> sadym: a crash event makes sense, but it'd be hard to specify correctly.
<simonstewart> jgraham: I think I agree, a dedicated event for context crashes makes sense. It's something that clients should subscribe to early and handle. I think at some point a way of better filtering log messages on the remote end would be helpful (eg. these are the events, types, and levels we want to be concerned about)
<simonstewart> jgraham: I've been putting that off, but it's likely a command on the log module which can be executed before the subscribe method
<simonstewart> Action: gsnedders File an issue for the log event

whimboo commented 1 month ago

In case of Firefox I want to add that at the moment we take down the browser if a tab crashes. I would be really happy to change that when a client can successfully handle those crashes.

At the same time we should make sure that test harnesses can check for existing minidump files and as well correlate those with the received event from the BiDi client. Maybe we would have to add a field to the event that points to the specific minidump files created for this crash.

gsnedders commented 1 month ago

Maybe we would have to add a field to the event that points to the specific minidump files created for this crash.

I feel like having vendor extensions on this event would be exceptionally useful — we don't seem to have at all specified whether or not events (or other messages) can contain extension fields (nor what "matching" means at all, #102), because one can easily imagine wanting to send PID when possible, maybe some other metadata about the process, maybe a crashlog, maybe a link to a crash file, etc. etc. etc.

But what we want to return seems very likely to vary between implementations.

w3c / webdriver-bidi

Add event "browsingContext.crashed" to indicate a crash of the web content process #723