ulixee / hero

The web browser built for scraping
MIT License
647 stars 32 forks source link

Add ability to set network responses #236

Closed blakebyrnes closed 11 months ago

blakebyrnes commented 11 months ago

Per discord user @Foonk, there's a desire to intercept network content bodies. The simplest approach is to allow a user to specify a list of string | Regex to the response body/headers/status during Hero configuration.

This would have some limitations - for instance, when a user doesn't know the urls upfront to replace.. however, it's a good basic solution, as intercepting each individual request/response would be too slow going across to the client before every http request.

Implementation:

I would implement by allowing an optional alternate format for blockedResourceUrls in the Hero constructor item that includes a body and optional headers.

If mimicking blockedResourceUrls, you can follow its use in client/lib/Hero.ts -> core/lib/Tab.ts. It's set on the internal Session options, and when a Tab is created, it will install the blocked resource types and urls:

private async waitForReady(): Promise<void> {
    await this.mainFrameEnvironment.isReady;
    if (this.session.options?.blockedResourceTypes) {
      await this.setBlockedResourceTypes(this.session.options.blockedResourceTypes);
    }
    if (this.session.options?.blockedResourceUrls) {
      await this.setBlockedResourceUrls(this.session.options.blockedResourceUrls);
    }
  }

You can see that blockedResourceUrls adds to the mitmSession.interceptorHandlers. This list has a url or type that has an optional "handler" to reply back to the request coming from the client. For the implementation of this feature, I would expect to build a simple response handler that sends back the configured headers/body and status codes.

NOTE that the request/response can be http1 or http2, so you need to deal with header conversions.

FoonkG commented 11 months ago

PR: #237