grantila / fetch-h2

HTTP/1+2 Fetch API client for Node.js
MIT License
336 stars 16 forks source link

Concurrent HTTP/2 requests to different hosts sharing the same IP address return the same response for all requests #110

Closed stefan-guggisberg closed 4 years ago

stefan-guggisberg commented 4 years ago

Concurrent requests to subdomains resolving to the same IP address, e.g.

return the same response for all requests.

The problem only occurs when using HTTP/2.

To reproduce:

const crypto = require('crypto');
const { fetch } = require('fetch-h2');

async function request(url) {
  const res = await fetch(url);
  const data = await res.text();
  const md5 = crypto.createHash('md5').update(data).digest().toString('hex');
  console.log(`HTTP Version: ${res.httpVersion}, hash of ${url} response: ${md5}`);
}

async function run() {
  await Promise.all([
    request('https://already--grantila.hlx.page/README.html'),
    request('https://fetch-h2--grantila.hlx.page/README.html'),
  ]);
}

run().catch(console.error);

Running the above code will output identical md5 hash values for both responses.

Workarounds:

grantila commented 4 years ago

Thanks for a great bug report 👌

However, I don't think this is a bug in fetch-h2 but rather in the server. In HTTP/2 you're supposed to re-use one socket/session if possible, and that's controlled e.g. by ssl certs' subjectAltName field. If I curl, I see:

*  subjectAltName: host "already--grantila.hlx.page" matched cert's "*.hlx.page"

so every concurrent call to *.hlx.page should re-use the same socket.

However, that's not the problem here, the problem is that the server looks at the Host header to know what to do, but that's not really how to do it in HTTP/2, then the :authority pseudo-header should be looked at. From the HTTP/2 spec under 8.1.2.3 Request Pseudo-Header Fields:

The :authority pseudo-header field includes the authority portion of the target URI ([RFC3986], Section 3.2). The authority MUST NOT include the deprecated userinfo subcomponent for http or https schemed URIs.

To ensure that the HTTP/1.1 request line can be reproduced accurately, this pseudo-header field MUST be omitted when translating from an HTTP/1.1 request that has a request target in origin or asterisk form (see [RFC7230], Section 5.3). Clients that generate HTTP/2 requests directly SHOULD use the :authority pseudo-header field instead of the Host header field. An intermediary that converts an HTTP/2 request to HTTP/1.1 MUST create a Host header field if one is not present in a request by copying the value of the :authority pseudo-header field.

This part; Clients that generate HTTP/2 requests directly SHOULD use the :authority pseudo-header field instead of the Host header field.

This means that a server which understands HTTP/2, and which uses the hostname to figure out what to do, must look at the :authority if it's HTTP/2, and Host if it's HTTP/1.1. Since your example works with the Host set, I assume the server doesn't look at :authority.

hlx.page doesn't allow fetch calls from the browser, complains on CORS. It would be interesting to see if that exact code you gave, worked in the browser (and in which browsers).

I'm closing this because I'm relatively sure fetch-h2 is doing the right thing here (since #9), but if this appears as a problem for many servers, perhaps also sending a Host field is an option, although I'd like to know how other clients handle this first.

stefan-guggisberg commented 4 years ago

I am pretty sure it's not a server issue. The server is Fastly (CDN).

Since your example works with the Host set, I assume the server doesn't look at :authority.

No, the :authority is set (via the Host header, see #9).

I believe the problem is that fetch-h2 does not set the :authority pseudo header when using the HTTP/2 protocol.

There might also be a problem with subdomains sharing the same IP address. It's common that different subdomains share the same IP address (e.g. server1.acme.com and server2.acme.com). They are still separate hosts.

Maybe sessions are internally identified by IP address & port instead of :authority?

Please consider reopening this issue.

grantila commented 4 years ago

I just wrote a test against github.io (github pages) and you seem to be right. That's certainly not good!

grantila commented 4 years ago

I think I've fixed this now, but testing against hlx.page is really slow (takes 3-4 seconds to read the text), and concurrent requests stall. I'm not sure why this happens, it works well for concurrent {lang}.wikipedia.org requests (which also uses a wildcard SAN of *.wikipedia.org), so session/socket re-use and :authority header should now be set correctly.

github-actions[bot] commented 4 years ago

:tada: This issue has been resolved in version 2.5.0 :tada:

The release is available on:

Your semantic-release bot :package::rocket: