mswjs / interceptors

Low-level network interception library.
https://npm.im/@mswjs/interceptors
MIT License
565 stars 127 forks source link

Support "CONNECT" requests (in proxies) #481

Open jcarlsonautomatiq opened 1 year ago

jcarlsonautomatiq commented 1 year ago

MSW is a great library and I have used it successfully in other "got" calls but in one part of my unit testing I was actually checking that a proxy call would actually try and use the proxy. Unfortunately I think the way that the path and proxy information is generated results in an invalid URL which bails before you can get a matcher to handle it. The proxy host is correctly assigned but then the path(the actual url we call) is just directly concatenated as a path.


   // From getUrlByRequestOptions.ts:169
   =============
   const url = new URL(`${protocol}//${hostname}${path}`)
   url.username = credentials?.username || ''
   url.password = credentials?.password || ''

   // ERROR
   =============
          input: 'http://fake.proxy:443fake.ping:443',
          code: 'ERR_INVALID_URL',

            at new URL (node:internal/url:783:36)
            at getUrlByRequestOptions (/node_modules/.pnpm/@mswjs+interceptors@0.25.7/node_modules/@mswjs/interceptors/lib/node/chunk-YVNH3GJ5.js:640:15)
            at normalizeClientRequestArgs (/node_modules/.pnpm/@mswjs+interceptors@0.25.7/node_modules/@mswjs/interceptors/lib/node/chunk-YVNH3GJ5.js:766:11)

// From the DEBUG:
=============
3:51:32:307 [http normalizeClientRequestArgs] successfully cloned RequestOptions! {"rejectUnauthorized":false,"agent":{"_events":{},"_eventsCount":2,"defaultPort":443,"protocol":"https:","options":{"keepAlive":true,"keepAliveMsecs":1000,"maxSockets":256,"maxFreeSockets":256,"scheduling":"lifo","noDelay":true,"path":null},"requests":{},"sockets":{},"freeSockets":{},"keepAliveMsecs":1000,"keepAlive":true,"maxSockets":256,"maxFreeSockets":256,"scheduling":"lifo","maxTotalSockets":null,"totalSocketCount":0,"maxCachedSessions":100,"_sessionCache":{"map":{},"list":[]},"proxy":"http://proxyUsername:proxyPassword@fake.proxy:443/"},"setHost":true,"method":"GET","headers":{"user-agent":"got (https://github.com/sindresorhus/got)","accept-encoding":"br, gzip, deflate"}}
13:51:32:307 [http normalizeClientRequestArgs] derived request options: {"method":"GET","protocol":"https:","hostname":"fake.ping","host":"fake.ping","path":"/pinging","rejectUnauthorized":false,"agent":{"_events":{},"_eventsCount":2,"defaultPort":443,"protocol":"https:","options":{"keepAlive":true,"keepAliveMsecs":1000,"maxSockets":256,"maxFreeSockets":256,"scheduling":"lifo","noDelay":true,"path":null},"requests":{},"sockets":{},"freeSockets":{},"keepAliveMsecs":1000,"keepAlive":true,"maxSockets":256,"maxFreeSockets":256,"scheduling":"lifo","maxTotalSockets":null,"totalSocketCount":0,"maxCachedSessions":100,"_sessionCache":{"map":{},"list":[]},"proxy":"http://proxyUsername:proxyPassword@fake.proxy:443/"},"setHost":true,"headers":{"user-agent":"got (https://github.com/sindresorhus/got)","accept-encoding":"br, gzip, deflate"}}
13:51:32:307 [http normalizeClientRequestArgs] has no default agent, setting the default agent for "https:"
13:51:32:307 [http normalizeClientRequestArgs] successfully resolved url: https://fake.ping/pinging
13:51:32:307 [http normalizeClientRequestArgs] successfully resolved options: {"method":"GET","protocol":"https:","hostname":"fake.ping","host":"fake.ping","path":"/pinging","rejectUnauthorized":false,"agent":{"_events":{},"_eventsCount":2,"defaultPort":443,"protocol":"https:","options":{"keepAlive":true,"keepAliveMsecs":1000,"maxSockets":256,"maxFreeSockets":256,"scheduling":"lifo","noDelay":true,"path":null},"requests":{},"sockets":{},"freeSockets":{},"keepAliveMsecs":1000,"keepAlive":true,"maxSockets":256,"maxFreeSockets":256,"scheduling":"lifo","maxTotalSockets":null,"totalSocketCount":0,"maxCachedSessions":100,"_sessionCache":{"map":{},"list":[]},"proxy":"http://proxyUsername:proxyPassword@fake.proxy:443/"},"setHost":true,"headers":{"user-agent":"got (https://github.com/sindresorhus/got)","accept-encoding":"br, gzip, deflate"},"_defaultAgent":{"_events":{},"_eventsCount":2,"defaultPort":443,"protocol":"https:","options":{"keepAlive":true,"scheduling":"lifo","timeout":5000,"noDelay":true,"path":null},"requests":{},"sockets":{},"freeSockets":{},"keepAliveMsecs":1000,"keepAlive":true,"maxSockets":null,"maxFreeSockets":256,"scheduling":"lifo","maxTotalSockets":null,"totalSocketCount":0,"maxCachedSessions":100,"_sessionCache":{"map":{},"list":[]}}}
13:51:32:308 [http normalizeClientRequestArgs] successfully resolved callback: undefined
13:51:32:310 [http request] request call (protocol "http"): [{"method":"CONNECT","host":"fake.proxy","port":"443","path":"fake.ping:443","setHost":false,"headers":{"connection":"keep-alive","host":"fake.ping:443","proxy-authorization":"Basic cHJveHlVc2VybmFtZTpwcm94eVBhc3N3b3Jk"},"agent":false}]
13:51:32:310 [http normalizeClientRequestArgs] arguments [{"method":"CONNECT","host":"fake.proxy","port":"443","path":"fake.ping:443","setHost":false,"headers":{"connection":"keep-alive","host":"fake.ping:443","proxy-authorization":"Basic cHJveHlVc2VybmFtZTpwcm94eVBhc3N3b3Jk"},"agent":false}]
13:51:32:310 [http normalizeClientRequestArgs] using default protocol: http:
13:51:32:310 [http normalizeClientRequestArgs] first argument is RequestOptions: {"method":"CONNECT","host":"fake.proxy","port":"443","path":"fake.ping:443","setHost":false,"headers":{"connection":"keep-alive","host":"fake.ping:443","proxy-authorization":"Basic cHJveHlVc2VybmFtZTpwcm94eVBhc3N3b3Jk"},"agent":false}
13:51:32:310 [http normalizeClientRequestArgs] normalized request options: {"method":"CONNECT","host":"fake.proxy","port":"443","path":"fake.ping:443","setHost":false,"headers":{"connection":"keep-alive","host":"fake.ping:443","proxy-authorization":"Basic cHJveHlVc2VybmFtZTpwcm94eVBhc3N3b3Jk"},"agent":false,"protocol":"http:"}
13:51:32:310 [utils getUrlByRequestOptions] request options {"method":"CONNECT","host":"fake.proxy","port":"443","path":"fake.ping:443","setHost":false,"headers":{"connection":"keep-alive","host":"fake.ping:443","proxy-authorization":"Basic cHJveHlVc2VybmFtZTpwcm94eVBhc3N3b3Jk"},"agent":false,"protocol":"http:"}
13:51:32:310 [utils getUrlByRequestOptions] figuring out url from request options...
13:51:32:310 [utils getUrlByRequestOptions] protocol http:
13:51:32:310 [utils getUrlByRequestOptions] host fake.proxy
13:51:32:310 [utils getUrlByRequestOptions] port 443
13:51:32:310 [utils getUrlByRequestOptions] hostname fake.proxy:443
13:51:32:310 [utils getUrlByRequestOptions] path fake.ping:443
13:51:32:310 [utils getUrlByRequestOptions] credentials undefined
13:51:32:310 [utils getUrlByRequestOptions] auth string:

Code that pretty much reproduces it.

// Pseudo code reproduction
const { HttpsProxyAgent } = require('hpagent');
const host = 'fake.proxy';
const username = "proxyUsername";
const password = "proxyPassword";
const port = '443';

   let proxy =  HttpsProxyAgent({
      keepAlive: true,
      keepAliveMsecs: 1000,
      maxSockets: 256,
      maxFreeSockets: 256,
      scheduling: 'lifo',
      proxy: `http://`${username}:${password}@${host}:${port}`,
    });

  // Then used in a got call like this:
 export const PING_URL  = "https://fake.ping/pinging";
 got.get({url: PING_URL, agent: { https: proxy})
kettanaito commented 1 year ago

Hey, @jcarlsonautomatiq. Thanks for reporting this.

Not sure what's going off here. The path option in ClientRequest must always be a string starting with /. It appears that when using hpagent, it's not the case. This may as well be one of the many real-world use cases that are tricky and have to be handled in getUrlByRequestOptions.

We need to add a test for this.

kettanaito commented 1 year ago

@jcarlsonautomatiq can you please share how you intend to mock those requests?

kettanaito commented 1 year ago

Your original example also has a mistake: got cannot accept that proxy agent as the value of agents.https. Doing so produces both TypeScript and runtime errors.

TypeScript:

No overload matches this call.
  The last overload gave the following error.
    Property 'options' is missing in type 'HttpProxyAgent' but required in type 'Agent'.ts(2769)
https.d.ts(31, 9): 'options' is declared here.

Runtime:

RequestError: Protocol "https:" not supported. Expected "http:"
 ❯ Request._makeRequest ../node_modules/.pnpm/got@11.8.6/node_modules/got/dist/source/core/index.js:1191:19
 ❯ new ClientRequest ../node:_http_client:189:11

I can help with fixing this but I need an exact example that is complete and working (well, failing, in our case).

jcarlsonautomatiq commented 1 year ago

This code is currently being migrated from raw working but untested JS to Typescript which was one of the reasons I was actually adding the unit tests. I wonder if there is an update in got as well we are on "got-cjs": "^12.0.1" and it doesn't cause a protocol error. I will see about stripping this down to remove all the company stuff.

As for testing it I was hoping I would be able to intercept the proxy url and then look at the original request to determine what to return (or just grab a ?url out of the call). A little hacky but it would let me use the proxy calls and ensure I didn't break anything in the migration. I will hack up a proper working snippet this morning.

kettanaito commented 1 year ago

The issue with the path originates from here: https://github.com/delvedor/hpagent/blob/96f45f1d40bfbdfd0fcc84cdba056be6e0fb8f4c/index.js#L23

I'm not sure what's the intention here by providing the following request options to create a request:

[
  {
    method: 'CONNECT',
    host: 'fake.proxy',
    port: '443',
    path: 'fake.ping:80',
    setHost: false,
    headers: {
      connection: 'keep-alive',
      host: 'fake.ping:80',
      'proxy-authorization': 'Basic cHJveHlVc2VybmFtZTpwcm94eVBhc3N3b3Jk'
    },
    agent: false,
    timeout: 0
  }
]

What's the intended end request URL here, I wonder?

I can assume it's https://fake.proxy/fake.ping:80. Once I do that, we get the next error:

RequestError: 'CONNECT' HTTP method is unsupported.

And this one is far more interesting. See, the Fetch API specification forbids creating requests with CONNECT method:

new Request('/resource', { method: 'CONNECT' })

Security report

This fails standalone.

jcarlsonautomatiq commented 1 year ago

The code is using some long lived proxies from Nimble. I expect that might be some of the strange connection issue but that is definitely odd. And apologies I should have had the fully working example in place initially.

    import { HttpsProxyAgent } from 'hpagent';  // version: <edit>. Updated to 1.2.0 and it still will repro.
    import { got } from 'got-cjs';  // version: 12.5.4

    const host = 'fake.proxy';
    const username = "proxyUsername";
    const password = "proxyPassword";
    const port = '443';

    let proxyUrl = `http://${username}:${password}@${host}:${port}`;
    let proxy = new HttpsProxyAgent({
       keepAlive: true,
       keepAliveMsecs: 1000,
       maxSockets: 256,
       maxFreeSockets: 256,
       scheduling: 'lifo',
       proxy: proxyUrl,
     });

      // Then used in a got call like this:
     const PING_URL  = "https://fake.ping/pinging";
     await got.get({
        url: PING_URL,
        throwHttpErrors: false,   // No idea the code thought it should do this?
        agent: { https: proxy},
        https: { rejectUnauthorized: false },
      });
jcarlsonautomatiq commented 1 year ago

It seems like if the invalid URL error is fixed this code test will probably have to be moved to the browser version of the MSW lib and only tested e2e which should be fine. Plus eventually I will run into the same agent and type conflicts :)

I will look more at the hpagent code that it might need a note about not being used in Node directly. Or forcing a custom http agent when used with a keepAlive connection outside a browser. odd ad of Nodejs > 19 supposedly keepAlive works with native fetch but finding definitive info about CONNECT in the spec is surprisingly hard.

https://stackoverflow.com/questions/62500011/reuse-tcp-connection-with-node-fetch-in-node-js

kettanaito commented 1 year ago

Okay, so learned a lot from the Node.js docs, Got, and hpagent here. Sharing below.

The URL is invalid, or is it?

Turns out, when performing the CONNECT method in ClientRequest, the path option is expected to point to the target host:

const options = {
  port: 1337,
  host: '127.0.0.1',
  method: 'CONNECT',
  path: 'www.google.com:80',
}

const req = http.request(options)

Excerpt from the example in Node.js docs

Moreover, when the server receives that request, the req.url already equals to www.google.com:80. This is some Node.js server magic I presume, but Interceptors must do the same thing (req.method remains CONNECT).

Request flow during proxying

This is what happens to the request in a simplified proxy setup (following the same Node.js example):

  1. Request is constructed with CONNECT method and <target> path.
  2. Server receives the connect event where req.method is CONNECT and req.url is <target>.
  3. Server creates a new connection to req.url.
  4. Once the connection is established to the target host, server writes to the request the 200 Connection <target> HTTP message (direct request.write()).
  5. Server then writes the head from the connect event to the request also.
  6. Request then emits the connect event with three arguments:
    1. res, IncomingMessage. This is the `200 Connection `` response written by the server to the request (the header of the connection HTTP message).
    2. socket, net.Socket, the socket of the new connection to the <target>.
    3. head, Buffer, whatever head server has written in its connect event.

In the connect request callback, you can then issue new requests by writing directly onto the target socket and receiving the response from the target via socket.on('data').

I assume that last part is handled by hpagent here because it's rather low-level.

kettanaito commented 1 year ago

Basically, we don't support this message exchange right now. This is a new feature request and I will prioritize it accordingly. Will push the WIP pull request I have right now with the failing test and some findings. Thanks for making this apparent, @jcarlsonautomatiq!

I'm not sure as of now how to properly support this. Since the connect event on ClientRequest exposes an extraneous Socket, it means we have to support the socket-level messaging in Interceptors (see #399). This isn't that easy and I'd rather avoid it if we can.

jcarlsonautomatiq commented 1 year ago

Thanks for putting all the information in here it makes it a lot easier to think it through! If I might recommend a possible test framework solution I think just about everyone would accept it might go like this (I certainly would):

kettanaito commented 1 year ago

Checkpoint