Specify a network request interception feature

A common use case for automation is to take a request that would be made e.g. to a third party API and provide a mock response for that reques, or to take a request that requires HTTP authentication and automatically provide that authentication. This is a feature that's exposed in Puppeteer and is in the works for Selenium.

The existing CDP API is the Fetch domain. Something similar has been added to WebKit. Gecko has similar features with a different API exposed to extensions.

Given that the CDP-ish model is already widely implemented in browsers and used in clients, it makes sense to spec something close to that. Practically this means:

A means to enable request interception for a specific set of contexts (or realms; serviceworkers for example seem like an important use case here) and with a URL filter that's allows only intercepting a subset of responses
An event that's produced when a request matches the filter with details about the request, including an id that can be used to continue the request.
A command to fulfill the request, suppliying the response headers and body.
A (possibly seperate) command to cause a network error
Some way to get the response that would have been returned by the network request so that it can be modified before use.

Auth seems to be handled somewhat seperately, with the following:

An event that's fired when a request results in a HTTP auth challenge
A command to provide the authentication response

The Browser Testing and Tools Working Group just discussed Network interception.

The full IRC log of that discussion

<AutomatedTester> Topic: Network interception
<simonstewart> Is there a link to the issue?
<AutomatedTester> jgraham: One of the common use cases for bidi from Selenium and we've seen it in puppeteer is the ability to see when a specific request happens and do a specific case
<AutomatedTester> ... such as basic auth
<AutomatedTester> ... or the othre request is network requests where we can say if X happens that we mock those network requests
<foolip> q+
<cb> q+
<AutomatedTester> ... I think this part of the "value add" feature sets
<AutomatedTester> ... I've been looking around at this and CDP has the Fetch domain
<AutomatedTester> ... [describes how CDP does this case]
<AutomatedTester> ... looking at WebKit it looks very similar
<AutomatedTester> ... Gecko has a very model in webextensions
<foolip> https://github.com/w3c/webdriver-bidi/issues/66
<simonstewart> q+
<AutomatedTester> Github topic: https://github.com/w3c/webdriver-bidi/issues/66
<AutomatedTester> ... so we have precendent in this area
<jgraham> a/very model/slightly different model/
<jgraham> s/very model/slightly different model/
<AutomatedTester> q?
<AutomatedTester> ack foolip
<jgraham> RRSAgent: make minutes v2
<RRSAgent> I have made the request to generate https://www.w3.org/2020/10/28-webdriver-minutes.html jgraham
<drousso> q+
<drousso> lol oops i used the wrong command :P
<AutomatedTester> foolip: is the model to both replace response without doing the response or can we edit and pass it back
<AutomatedTester> ... that's how I think CDP does this
<AutomatedTester> jgraham: I think there is a getResponse type command that you get it and then can edit it
<AutomatedTester> ... both seem like meaningful use cases
<AutomatedTester> q?
<AutomatedTester> ack cb
<AutomatedTester> cb: is the scope of this feature to capture all web resources?
<AutomatedTester> jgraham: I believe that the scope should be to all resource fetches
<brwalder> q+
<AutomatedTester> ... there is a question that I don't know the answer to which is how would service workers handle this
<AutomatedTester> q?
<AutomatedTester> ack simonstewart
<AutomatedTester> simonstewart: there are a lot of implementations for how intercepting of auth since there are cases like webauthn
<AutomatedTester> ... puppeteer doesn't seem to have a handle on this yet
<AutomatedTester> ... are we going to make it flexible enough
<AutomatedTester> jgraham: yes, those are important cases
<AutomatedTester> ... I am not sure they should be part of this API. I think they have already added webdriver features and they are best placed to handle this
<AutomatedTester> ... I've just looked and they have a VirtualAuthenticator for testing
<AutomatedTester> simonstewart: Ive seen that in the selenium code base
<AutomatedTester> q?
<AutomatedTester> ack drousso
<AutomatedTester> drousso: the reason we have added the support for things like this purely for webinspector, not for automation but can be used there
<AutomatedTester> foolip: can you replace request headers as you see fit?
<AutomatedTester> drousso: yes, and there are some pass throughs that means if people dont have send all the items
<AutomatedTester> q?
<AutomatedTester> ack brwalder
<AutomatedTester> brwalder: re: service workers. In CDP this is set at the network layer and would not work for service workers as that's different
<foolip> q+
<simonstewart> q?
<AutomatedTester> jgraham: that makes sense. I did have a question around filtering of this. How would we limit this to browsing context/realm?
<shengfa> There is also recent code merge in ChromeDriver for webauth(https://chromium-review.googlesource.com/c/chromium/src/+/2488183) with reference links in commit message.
<AutomatedTester> q?
<AutomatedTester> ack foolip
<AutomatedTester> foolip: a question for brwalder : Does this bypass the cache?
<AutomatedTester> brwalder: Enabling the fetch domain disables the cache
<AutomatedTester> simonstewart: that's not true you need to do it manually
<AutomatedTester> drousso: webkit turns it off
<AutomatedTester> foolip: so we need to be smart here when designing it
<AutomatedTester> simonstewart: I would like to discuss this more indepth
<AutomatedTester> ... I imagine people have in their minds what puppeteer do in the fetch domain
<AutomatedTester> ... are we wanting to have a similar model for the future
<AutomatedTester> ... I don't know how webkit do this
<AutomatedTester> drousso: I dont know CDP but will explain webkit
<AutomatedTester> .... webkit does something similar that it takes a pattern for an interception and at what stage
<AutomatedTester> ... this is only request or response currently
<brwalder> q+
<AutomatedTester> ... the engine will notice and emit an event to the front end
<AutomatedTester> ... requestContinue - carry on as normal
<AutomatedTester> ... [describes 3 other events]
<AutomatedTester> ... All of this par tof the network domain and not a separate domain
<AutomatedTester> jgraham: I don't think we should have a spralling mass of modukes
<AutomatedTester> ack brwalder
<AutomatedTester> brwalder: that sounds more or less like CDP Fetch
<AutomatedTester> ... and has similar workflow
<AutomatedTester> ... and if you want the cache disabled you have to do it manually
<AutomatedTester> ... and the same for service workers
<AutomatedTester> q?
<AutomatedTester> simonstewart: So my understanding is that there will be a command that takes a regex and then emits events and you can manipulate it as you see fit
<AutomatedTester> ... and when you get an authentication request you will intercept and insert credentials or we can deny it
<AutomatedTester> jgraham: I agree
<AutomatedTester> simonstewart: and one thing more that we implicitly disable the cache
<AutomatedTester> drousso: both chrome and webkit disable the cache if requested
<AutomatedTester> ... but when we are going to override anything we make sure that automatically disable the cache instead
<AutomatedTester> jgraham: I think we need to be clear in the spec and it seems webkit are doing the right thing, especially for automation
<AutomatedTester> ... I cant see a use case for the CDP way other thank creating flaky tests
<jgraham> s/doing the right thing/have the correct cache-disabling behaviour/
<jgraham> RRSAgent: make logs public
<RRSAgent> I have made the request, jgraham
<jgraham> RRSAgent: make minutes v2
<RRSAgent> I have made the request to generate https://www.w3.org/2020/10/28-webdriver-minutes.html jgraham
<AutomatedTester> Resolution: Create a command that takes a regex and then emits events and users can manipulate requests and responses based on the events sent. When an authentication request is intercepted we can insert credentials or deny them

The Browser Testing and Tools Working Group just discussed Network request interception.

The full IRC log of that discussion

<AutomatedTester> topic: Network request interception
<AutomatedTester> github: https://github.com/w3c/webdriver-bidi/issues/66
<AutomatedTester> jgraham (IRC): This is a more speculative item
<AutomatedTester> ... and I want to start work on the design so we can get it a little done over the next year
<AutomatedTester> ... we have talked about network logging to get some info
<AutomatedTester> ... but sometimes people want to intercept, modify and carry that on
<AutomatedTester> ... people want to intercept a request and send back a mocked response
<AutomatedTester> ... so I think this is a feature we should have
<AutomatedTester> q+
<AutomatedTester> ... so one thing here is that network interception can put a lot of pressure on the client to handle each
<sadym> q+
<AutomatedTester> ... and since some people test this locally and that's fine but we do have selenium users have this over the internet
<AutomatedTester> ... the proposal is there is an event for browsing context/realms and with some form of filter
<AutomatedTester> ... we have network events
<AutomatedTester> ... do we want separate events or use the logging and then return
<AutomatedTester> q?
<jgraham> AutomatedTester: My experience is that people using this feature in Selenium are confused about how this should/will work. For e2e testing this can be complex compared to lower level testing. Need to make it easier to use.
<jgraham> ... To the point about this being deployed over the internet, we can be talking about hundreds of requests that need to be intercepted. So prefiltering of events is a requirement.
<DevinRousso> q+
<jgraham> ... I'm concerned that this could cause a problematic number of events for Selenium in the cloud providers
<AutomatedTester> ack next
<AutomatedTester> ack next
<AutomatedTester> sadym (IRC): first question is add some kind of binding to the realm
<AutomatedTester> ... and try have this done in the browser and then return
<sadym> https://chromedevtools.github.io/devtools-protocol/tot/Network/#type-RequestPattern
<AutomatedTester> ... and then AutomatedTester_ (IRC) do you think that URL, resource type and interception stage is enough/
<jgraham> AutomatedTester: Yes, that seems fine. To the point about doing this on the remote end, if we can figure it out I'd prefer that kind of solution. Pushing everything onto the client is a problem if things are running in parallel; can end up with a huge number of simultaneous requests. Server side interception would fix that. But the filters are a good start.
<AutomatedTester> ack next
<AutomatedTester> Devin Rousso: within web inspector we have filters and it can even be at filtering on request/response
<AutomatedTester> ... webinspector doesnt have the resouirce type filtering
<AutomatedTester> sadym (IRC): it's useful for changing cats to dogs on a page
<AutomatedTester> jgraham (IRC): I can see resource filtering for all images then go for there
<AutomatedTester> Devin Rousso: in web inspector we only allow filtering on EITHER request or response...
<AutomatedTester> ... our reason is if you control one or the other then it's easier
<AutomatedTester> ... we still send out the requests in case the server requires it
<jgraham> q?
<AutomatedTester> ... it saves us trouble
<AutomatedTester> jgraham (IRC): web extensions has a similar aPI here
<AutomatedTester> ... does your api have overlap with web extensions
<AutomatedTester> Devin Rousso: there is some but I don't know if they have full overlap
<AutomatedTester> ... I don't have much knowledge of manifest v3
<AutomatedTester> Luka: in chrome they have deprecated the interception API
<AutomatedTester> jgraham (IRC): the stream API gives us what we need and we can manipulate it
<AutomatedTester> luka: [mentions how this works with streams]
<AutomatedTester> jgraham (IRC): [describes how things work in CDP]
<AutomatedTester> ... the main question is Do we start with how CDP works or how sadym (IRC) suggested that it happens in a sandbox specific way
<AutomatedTester> ... and I have a concern about how we this might work as we haven't got the primatives to make it work
<AutomatedTester> ... and we want to make sure we support clients that already here
<AutomatedTester> ... and it would be good to make sure we check how other groups are using it and then see if we can support them
<sadym> q+
<BrandonWalderman> q+
<AutomatedTester> AutomatedTester_ (IRC): I would vote for sadym (IRC) 's idea of putting this more onto the server
<BrandonWalderman> q-
<AutomatedTester> ack next
<AutomatedTester> sadym (IRC): i want to point out that it adds complexity to the server
<AutomatedTester> ... so it's harder to implement and spec out
<jgraham> `getResponseBody` is used in Puppeteer/Playwright but `takeResponseBodyAsStream` is not.
<jgraham> q+
<BrandonWalderman> q+
<AutomatedTester> automatedtester: My preference would be to add more complexity to this working group so that people can use this feature in a meaningful way rather than simplifying things for the sake of the spec and putting a lot of pressure on client authors and users and then no one uses the feature
<AutomatedTester> ack next
<AutomatedTester> jgraham (IRC): I am not convinced for clients that injecting Js would be easier for them
<AutomatedTester> ... there are perf reasons for injecting it all
<AutomatedTester> ... so for the first iteration that we all people to replace the responses but not modify on the fly
<AutomatedTester> q?
<AutomatedTester> ack next
<jgraham> s/all/allow/
<AutomatedTester> Brandon Walderman: I do see the value of pushing things on to the server side for perf and ease of client creation
<AutomatedTester> ... but I am hearing that there is concern of doing it in a sandbox
<AutomatedTester> ... I wonder if there is a middle ground here where we can do some form of [explains data structures]
<AutomatedTester> ... that way we can block things easily
<AutomatedTester> ... and if you want more advanced cases then we can do more CDP
<jgraham> q+
<AutomatedTester> ... and we can add the sandbox solution in the future
<AutomatedTester> ack next
<sadym> q+
<AutomatedTester> jgraham (IRC): I think it will add more work
<sadym> q-
<sadym> +1 to what James said
<BrandonWalderman> +1
<AutomatedTester> ... we can add a more declaritive later
<AutomatedTester> ... and that way we don't need to have a more complex JS execution
<AutomatedTester> ... it would be good to get a set of data from cloud providers here
<AutomatedTester> ... but build the low level APi and we can build a higher level API later
<AutomatedTester> q?

I'm interested in having the this feature being spec'd. As Webdriver doesn't currently have an equivelant to Network.getResponseBody. Which is important to me because I would like to use the feature in Firefox but it looks like their CDP implementation is maintenance mode in favor Webdriver bidi related APIs. Until then only a fork of firefox shipped with Playwright can get response body. Which I'm uncomfortable depending on.

I've started a (thus far incomplete) draft of this feature at https://github.com/w3c/webdriver-bidi/compare/network_interception

The design at this point is as follows:

you register an interception using network.addIntercept. That takes a list of URL patterns the request can match (pattern language unspecified, but probably glob-like) and a list of phases at which you want to intercept (beforeRequestSent / auth / responseStarted).
Registering an interception returns an id. That can be passed to network.removeIntercept to disable the interception. This means that different parts of a client can independently register intercepts, and only remove the intercepts they registered (like preload scripts). The client is responsible for handling the case where there's more than one intercept registered for a specific request (see below).
There are no new events for intercepted requests, but the existing network lifecycle events get additional fields isBlocked which is always sent and set to true if the request is being intercepted, and intercepts which is a list of registered intercepts for the request, so the client can handle the case of multiple intercepts.
Because the same events are used, you currently need to subscribe to the relevant network lifecycle events in order for intercepts to work (if you aren't subscribed the intercept is skipped). This isn't perfect since it's not totally obvious that you need to both subscribe and also register an intercept, and because you can't currently subscribe to the events only in the case that the intercept is set, so you'd have to send and discard all non-intercept network events. There is potential for optimisation here.
The auth intercept phase is similar to responseStarted, but only when the status code and headers indicate that HTTP Auth credentials are set.
The events block the network request until the client has responded (just once, irrespective of the number of redirects). How to organise the client's response is the biggest open question. The requirement is that we can either modify the outgoing request, or set/modify the response before it's seen by the browser. I currently expect to support modifying the request headers and body, setting the full response headers / body, or changing the response headers (in the responseStarted phase), plus setting the auth credentials to use for future requests. Modifying the response body is difficult with this protocol design since it either requires many back/forth events to deliver chunks of the response to the bidi client before they get to the browser, or a mechanism for allowing the browser itself to run the intercept code (e.g. sending over a function that is called with chunks of the body). So I expect to defer that to future versions of the protocol. Ignoring that, there are two obvious possible designs for responding to the intercept:
- Multiple commands, one per action, like CDP. So to update the request we'd have network.continueRequest, but to set a response we'd have network.fulfillRequest (or similar names).
- One command like netwok.continueFetch. This would allow setting either request or response properties, but it would be an error to attempt to set request properties after the request was sent. The behaviour would be determined by which fields are present in the command payload. So to just modify the request you'd set a request field with relevant sub-fields indicating the updated properties, whereas if you wanted to modify or fulfill the response, you'd set the corresponding response properties. The main question here is whether it's confusing that the actual action depends on the details of the payload parameters rather than the actual body itself. In principle this is also more flexible because you can do multiple actions at once (e.g. both modify the request headers and set credentials if you expect HTTP Auth to be required). Maybe one could extend it to modifying both the request and parts of the response in the request phase (e.g. both set the request body and some response headers, whilst still going to the network for the actual response body), but that may be unnecessary.

I have double checked the current implementation in Puppeteer and here are some thoughts:

1) it would be great to allow intercepting requests for a context and related contexts (e.g., an entire tab): seems to be not in the current draft? 2) Puppeteer only supports "modifying the request headers and body, setting the full response headers / body" without "changing the response headers" or "modifying the response body" parts. CDP supports the latter two as well. 3) Having multiple commands (network.continueRequest, network.fulfillRequest and network.failRequest) I think is more clear but combining them into one command would also work. 4) WDYT about isPaused or isIntercepted instead of isBlocked for the indication of the interception? we could also rely on the presence/absence of intercepts, I assume?

The Browser Testing and Tools Working Group just discussed Network request interception.

The full IRC log of that discussion

<jgraham_> Topic: Network request interception
<jgraham_> github: https://github.com/w3c/webdriver-bidi/issues/66
<jgraham_> jgraham: (tries to summarise what's in the comment at https://github.com/w3c/webdriver-bidi/issues/66#issuecomment-1542080588)
<jgraham_> q?
<simons> q+
<jgraham_> ack simons
<orkon> q+
<jgraham_> simons: We will need to set the response body quite often, and we'll need to block outgoing requests. I see you say that changing the response might be difficult, but we need that option.
<jgraham_> simons: Notably the way people use this is to mock out backend infrastructure and provide canned responses without having to have the backend running.
<jgraham_> simons: The CDP model feels OK with continueRequest and fulfillRequest; it's not too hard to use.
<jgraham_> jgraham: Replacing the network response is supported, but what's not supported at the moment is editing a response from the network as it arrives
<jgraham_> simons: People do sometimes want to do that e.g. for record/replay
<jgraham_> jgraham: You could also have a way to just get the entire body at the end.
<jgraham_> simons: Yes. We could add support for modifying the response later
<jgraham_> ack orkon
<jgraham_> orkon: CDP allows fetching the response and providing a different response. I agree we could start without that and see how much interest there is at a later point.
<jgraham_> orkon: Are interceptions per context or global?
<jgraham_> jgraham: From an API point of view it seems to be easy to limit to a specific context
<jgraham_> orkon: What about service workers or iframes? In CDP it's difficult to configure interceptions for new targets. Would be good to specify it per tab.
<jgraham_> jgraham: We have a similar problem for events, we could reuse that mechanism
<jgraham_> orkon: It can be difficult with iframes. We need some flexibility
<jgraham_> RRSAgent: make minutes
<RRSAgent> I have made the request to generate https://www.w3.org/2023/05/10-webdriver-minutes.html jgraham_

Has there been a consideration of enhancing this feature to allow intercepting WebSocket messages?

Has there been a consideration of enhancing this feature to allow intercepting WebSocket messages?

Would you mind filing this as a new enhancement issue request?

This issue should basically be done, or is there something missing @jgraham?

Going to mark this issue as done based on the merge of #429 a while ago.

w3c / webdriver-bidi

Specify a network request interception feature #66