whatwg / fetch

Fetch Standard
https://fetch.spec.whatwg.org/
Other
2.11k stars 328 forks source link

Use case for Headers getAll #973

Closed xtuc closed 1 year ago

xtuc commented 4 years ago

Since https://github.com/whatwg/fetch/commit/42464c8c3d2fd3437a19fc6afd2438a0fd42dde8 Headers.prototype.getAll has been deprecated/removed from the spec and implementations.

I understand that in browsers (including service workers) filtered responses don't include headers that could benefit from the getAll method. However, some JavaScript environment doesn't need to filter response/request headers, like serverless plateforms (for instance Cloudflare Workers) or maybe Nodejs at some point.

For example editing Cookie headers:

const h = new Headers;
h.append("Set-Cookie", "a=1; Expires=Wed, 21 Oct 2015 07:28:00 GMT");
h.append("Set-Cookie", "b=1; Expires=Wed, 21 Oct 2015 07:28:00 GMT");

h.get("Set-Cookie")
// a=1; Expires=Wed, 21 Oct 2015 07:28:00 GMT, b=1; Expires=Wed, 21 Oct 2015 07:28:00 GMT

Spliting the combined header value by , will give an invalid result. Instead getAll could be used to retrieve the individual headers.

annevk commented 4 years ago

Set-Cookie is the only such header, see also HTTP and #506.

It might make sense to expose it properly for code that wants to use this class generically, but I think we ought to allow for implementations of this class to eagerly combine headers generally and store Set-Cookie separately as a special case. That would suggest something like h.getSetCookie() to me.

cc @youennf @ddragana @yutakahirano

annevk commented 4 years ago

It's probably also worth investigating if https://www.npmjs.com/package/set-cookie-parser#splitcookiestringcombinedsetcookieheader is a reasonable alternative to expose. Though if that actually works HTTP would not have to consider cookies a special case...

xtuc commented 4 years ago

Maybe to clarify the environment a bit, Cloudflare Workers boils down to a proxy written in JavaScript and Node.js a HTTP client.

Set-Cookie is the only such header

For such use-cases, it make sense to not filter headers in the response and let the user define its own headers. Any custom headers may contain ,.

annevk commented 4 years ago

I disagree, HTTP doesn't guarantee intermediaries won't combine such headers so we shouldn't either. I'd want HTTP to call out more than Set-Cookie as requiring a special data model before going there.

domenic commented 4 years ago

I'm a bit confused why we'd revisit this. Non-browser environments can consider adding nonstandard extensions for whatever purpose they wish, but in browser environments there's no need for this functionality, and our reasoning from the previous discussions seems to hold. (And the spec is written for browser environments.)

annevk commented 4 years ago

We don't forbid appending Set-Cookie unconditionally. When it's appended, there's no way to get it back out per HTTP semantics. That mismatch irks me a bit.

xtuc commented 4 years ago

We are considering adding the getAll method for Cloudflare Workers, I don't know Node.js' plans.

With a basic filtered response you can access almost all the headers. I think this behavior can still happen in a browser.

annevk commented 4 years ago

To be clear, getAll() with the semantics I think you are suggesting would be incompatible with what we'd add here, if anything, as HTTP doesn't require preserving the distinction between

h.append(name, value)
h.append(name, value2)

and

h.append(name, `${value}, ${value2}`)

for any name other than Set-Cookie and this API shouldn't either. See also the discussion in #189.

kentonv commented 4 years ago

Non-browser environments can consider adding nonstandard extensions for whatever purpose they wish, but in browser environments there's no need for this functionality, and our reasoning from the previous discussions seems to hold. (And the spec is written for browser environments.)

In Cloudflare Workers, we'd really like to stick to standard APIs, to promote code portability across a wide range of JavaScript environments. We think it would be a great thing for the JavaScript ecosystem as a whole if common functionality like making HTTP requests actually worked the same everywhere, so it would be nice to see the fetch() standard as applying to more than just browsers.

We have an immediate need to support manipulation of Set-Cookie in Cloudflare Workers. Perhaps ironically, this is actually driven by a request from @rowan-m on the Chrome team related to the default SameSite=Lax change. A lot of people are going to need to work around this soon and doing it in CF Workers will likely be easier for many people than updating their backend code.

We are proceeding with implementing getAll() because although it is deprecated, the past semantics cover our needs, and we're more comfortable implementing something that has been specified in the past than something that has never been specified. The thing we really want to avoid is specifying a new API that turns out incompatible with implementations from other vendors, fragmenting the ecosystem. So while getSetCookie() might be better, getAll() has the advantage that we're pretty confident no one is going to implement it with different semantics. (Yes, I do understand that various platforms may make different decisions about when to automatically combine headers other than Set-Cookie, but I'm comfortable with that.)

domenic commented 4 years ago

I think that's a really unfortunate decision. I think you'd be better served by clearly signposting nonstandard APIs (e.g. using a Cloudflare global), instead of assuming that things which existed briefly in the spec but were never implemented anywhere have some sort of semi-standard semantics.

kentonv commented 4 years ago

I think you'd be better served by clearly signposting nonstandard APIs (e.g. using a Cloudflare global)

I thought vendor prefixes were no longer in favor?

existed briefly in the spec but were never implemented anywhere

Eh? Seems like it was implemented by almost every major browser at one point?

annevk commented 4 years ago

getAll() is not deprecated, it's removed, because its design is broken as explained here and in linked issues.

kentonv commented 4 years ago

@annevk Yes, I understand the issue -- for headers other than Set-Cookie, the header values may be comma-concatenated anyway by any intermediate proxy, so getAll() can't keep its promise.

Do you have a recommendation for how we should proceed, in order to both solve the immediate problem and stay standards-compatible long-term?

annevk commented 4 years ago

Again, not just by any intermediate proxy, by an implementation of the Headers class as well, as at least one implementer strongly argued for.

getSetCookie() still seems like a reasonable solution. @mnot thoughts?

xtuc commented 4 years ago

getSetCookie() sounds good to me now. Could we consider adding it to the browser fetch spec?

kentonv commented 4 years ago

What about getAll(), but it is defined to throw an exception if passed any other header name than "Set-Cookie"?

This has the advantage that if ever there comes along another such header (probably, a non-standard header used by some badly-designed HTTP API that one of our customers really really needs to support), we can easily accommodate...

annevk commented 4 years ago

@youennf @yutakahirano @ddragana @whatwg/http thoughts?

ddragana commented 4 years ago

I looks like we cannot read Set-Cookie if we once have set it, therefore I am for adding some function for this. getSetCookie sound good. I do not like getAll, because it sound like you can use it for every header field, but you cannot. If we want to make it generic getUnmergableHeader or something, but that is only for one header and it is unnecessary, so getSetCookie its fine.

tonyjmnz commented 3 years ago

cross-posting a suggestion from @nfriedly on #506 - use https://www.npmjs.com/package/set-cookie-parser#usage-in-react-native to parse the mangled cookie header

The fact that this inconsistency is still around is mind boggling to me

lucacasonato commented 3 years ago

For future reference, this is how we solve this in Deno's fetch implementation:

  1. Setting Set-Cookie headers does not concatenate headers, it will keep them seperate (both in the internal data structure, and on the wire)
  2. h.get("Set-Cookie") returns a concatentated list of Set-Cookie headers.
  3. Iterating over h (or using h.entries, h.values, or h.keys) will result concatentated headers, as specified except for set-cookie headers. These are returned as is, without concatenation. That means it is possible that h.entries returns multiple items where the header name is set-cookie.

This was the solution of us trying to come up with a solution for multiple years. We deem this breaks the least existing code, while also being relatively easy to understand and use.

kentonv commented 3 years ago

For the record, Cloudflare Workers went ahead with what I suggested in my last comment -- getAll(), but it is only allowed for the specific header name Set-Cookie.

annevk commented 2 years ago

I think what we should define here is a combination of https://github.com/whatwg/fetch/issues/973#issuecomment-561052450 and https://github.com/whatwg/fetch/issues/973#issuecomment-902578584. I.e., getSetCookie() which returns a sequence and the iterator which is special cased for Set-Cookie headers. The constructor already works. The internal data structure also already works.

Firefox is still supportive of adding this. Primarily to ensure that this data structure is actually suitable for all possible use cases.

@youennf @domenic thoughts?

domenic commented 2 years ago

I don't have any real thoughts here. The special iterator behavior seems a bit weird but I'm happy to believe https://github.com/whatwg/fetch/issues/973#issuecomment-902578584 when they say that they tried many things and found that to be the best.

I suspect @jasnell would also be interested in this issue.

jasnell commented 2 years ago

Thanks @domenic ... definitely interested. I'll need to review a bit of the history before I can weigh in tho. I appreciate the heads up.

youennf commented 2 years ago

I am not against a dedicated getter within the Headers class. That said, I'd like to mention the potential complexity for implementors: Cache API as well as exposing a request to service worker fetch event handler might need specific handling to preserve Set-Cookie headers, depending on the actual underlying implementation. It is unclear to me how valuable an addition this is to the web platform (are we talking about setting Set-Cookie to request headers?).

annevk commented 2 years ago

That's a very good point. One potential solution might be to add Set-Cookie to the forbidden header names, but this would have to be tested to see if it's web compatible. I agree that it's not desirable to have Request or Response instances that can contain Set-Cookie headers in their Header objects.

lucacasonato commented 2 years ago

I agree that it's not desirable to have Request or Response instances that can contain Set-Cookie headers in their Header objects.

But only on the web! In server side environments like Deno and Cloudflare Workers, Response objects are used to represent outgoing HTTP responses from the server to the client. These must be able to contain set-cookie headers. On the web Response objects containing a set-cookie headers can already not be created, because Response's headers use a "response" header guard, which forbids "set-cookie" headers. Both Deno and CFW work around this by not considering set-cookie a forbidden response-header name.

For Request objects I have no interest or opinion on set-cookie headers being allowed. They are basically useless, and I don't think anyone would be hurt if the header were to be blocked for requests.

I would be strictly opposed to any changes to the header representation in the spec (even just in Response objects) that would that would cause set-cookie headers to be unsupportable by otherwise conformant implementations.

It is unclear to me how valuable an addition this is to the web platform.

There is no inherent value to the web browsers specifically: set-cookie headers on responses are never exposed in the web platform, so users can not manually use these. The value lies in broader ecosystem compatibility and uniformity. Web browsers are just one of the implementors of the fetch spec: other JS runtimes use this API for HTTP fetches too (e.g. Deno, Cloudflare Workers, Node.js when using node-fetch). If the API that all of these runtimes use is standardised and uniform, it becomes much easier to write code that is portable across runtimes. Reading and setting set-cookie headers is a critical feature for server side HTTP implementations.

annevk commented 2 years ago

@lucacasonato sure, different flavor Request/Response objects could have a different Headers guard. My comment was strictly about what is currently defined in Fetch.

domenic commented 2 years ago

If there's no value in this for the web, I suggest we not merge it into the spec or implement it in browsers.

lucacasonato commented 2 years ago

@domenic I did not phrase that very elegantly. What I meant was that there is no value for the web if this change is looked at in total isolation, pretending that the Headers class is implemented and used only in the browser (ignoring the existence of server side JS runtimes that have the Headers class). This is obviously not the case though. Headers are used in browsers, Deno, Cloudflare Workers, Node.js and various other server side runtimes.

It is correct that the primary users of this specific feature are going to be users of server side JS runtimes. Developers developing exclusively for the web are likely not going to use this. It is critically important however that the API behaves the same in server side runtimes, and browsers. Otherwise packages written for both will need to branch behaviour on the runtime, which is not great.

[...new Headers([["set-cookie": "foo"], ["set-cookie": "bar"]])] should return the same result across browsers and server side runtimes. Not doing so is confusing, and will fragment the ecosystem.

That is the real value for browsers: keeping the API consistent between runtimes, and to avoid user confusion.

domenic commented 2 years ago

Thanks for the explanation. However, I still think then that the feature does not pull its weight.

kentonv commented 2 years ago

@domenic What is your opinion on how server-side implementations should engage with web standards? Should we assume that whatwg is flatly uninterested in any feature that doesn't benefit browsers? I guess we should form our own working group that specifies a diff against the web standards to be applied on the server side?

lucacasonato commented 2 years ago

@domenic Again, I agree with you if you were to look at the usage of Headers in browsers in isolation. Consider this scenario in a "service worker @ edge" environment like Deno Deploy or Cloudflare Workers, that allow you to rewrite requests/responses in a server environment before they get forwarded onto the client. This example adds a Content-Security-Policy header to all proxied requests:

addEventListener("fetch", (e) => {
  e.respondWith(handler(e.req));
});

async function handler(req) {
  const resp = await fetch(req); // forward the original request to the origin server
  const headers = new Headers(resp.headers); // clone the headers object to strip off the `immutable` header guard
  headers.set("Content-Security-Policy", "default-src 'none'; style-src 'self'; script-src 'self'");
  return new Response(resp.body, { status: resp.status, headers });
}

On the surface it looks like this should just work for all cases. It is only 4 lines after all. The problem is that it doesn't work. If the origin server responds with some set-cookie headers (which is pretty likely, as cookies are widely used), then the header cloning will mangle these set-cookie headers by concatenating them. Instead of responding to the client with multiple set-cookie headers, the client will now receive a single malformed cookie header.

With the current API as specified, there is no way to solve this. You can not just headers.get("set-cookie").split(", "), because , are valid characters in set-cookie headers.

As you can see, with this proposed change of altering the iterator, this code now works as expected. It solves a use-case which was previously not possible to solve with just features from the standard. Because of this we already have two incompatible extensions from Deno and Cloudflare that allow users to do this anyway. This is not sustainable - we need WHATWG to engage with other implementers that are not browser vendors - otherwise the ecosystem will diverge. This is not something that is in anyone's best interest IMO.

domenic commented 2 years ago

The WHATWG is working on standards for web browsers. To the extent other software wants to be compatible with web browsers, it's great to have their engagement. But if something does not sufficiently benefit web browser users then working on it elsewhere makes sense to me. It's totally up to them whether they want to be 100% compatible with a spec designed for web browsers, or add their own modifications and APIs. E.g. from what I understand most non-browser fetch implementations already ignore the CORS parts of the fetch spec and API. And they have non-WHATWG APIs for things like TCP/UDP. I don't know if those non-WHATWG APIs are standardized anywhere but I agree doing so would have sustainability benefits for server-side ecosystems avoiding divergence, if you are up for the extra investment.

Note that other browser engineers assessment of whether a proposal sufficiently benefits web browser users may be different than mine for any given proposal.

jasnell commented 2 years ago

A key challenge is that end users don't really care whether the code is running in the browser, on the edge, or in a server, they would like the APIs to be consistent and for us not to have different APIs for the same things in different environments. So, while I definitely understand the position and WHATWG's mission, where there is known existing overlap in functionality/API between these different environments, it would be great to avoid diverging APIs. If necessary to make progress, we can get Node.js, Deno, Workers, and other backend/edge environments together to collaborate on these things but it would also be great to recognize that "the web" is not limited to just web browsers.

annevk commented 2 years ago

Yeah, I think this is worth doing, especially for iteration as it seems rather bad if that behaves differently depending on where you run the code. I also saw an argument by @domenic that Headers is a generic data structure and therefore should have [Exposed=*] which I think also argues for handling Set-Cookie correctly.

Having said that, we need a better solution when Headers's guard is "request" as requiring a more complicated data structure in that case does not carry its weight I think, especially given that Set-Cookie is a response header. The simplest solution would be making Set-Cookie a forbidden header name, but that probably requires experimenting whether that breaks any unusual code out there.

youennf commented 2 years ago

that probably requires experimenting whether that breaks any unusual code out there.

It would make sense to do that experiment before proceeding with fetch spec changes.

Or getSetCookie API could be defined for non-browser-env only (say partial interface with [Exposed=non-browser-env]) in an extension spec, until we know it is safe to make Set-Cookie a forbidden header name on web environments. https://github.com/whatwg/fetch/pull/1346 could be split in two: infra part remaining in fetch spec, API going in an extension spec.

lucacasonato commented 2 years ago

What is the conclusion here? I'd be happy to put in the work to get this landed, but I am not sure exactly what to do. My current understanding:

That suggests that the next step is to measure usage of set-cookie headers in Headers objects with a "request" header guard. I expect this would mean adding a use counter in one of Chromium or Gecko. I have no idea how to do this: I tried doing it for Gecko just now, but I can not figure it out. I will also try for Chromium now. I'd appreciate any help on this front.

lucacasonato commented 2 years ago

I have figured out how to add a UseCounter to Chromium for this. I'll be submitting a patch for it soon.

lucacasonato commented 2 years ago

Added a UseCounter called "FetchSetCookieInRequestGuardedHeaders" to Chromium in https://chromium-review.googlesource.com/c/chromium/src/+/3423793. Data should be available here somewhat soon: https://chromestatus.com/metrics/feature/popularity#FetchSetCookieInRequestGuardedHeaders

annevk commented 2 years ago

Thanks @lucacasonato! The Cache API is a concern, but also transferring a request Headers object to a service worker. I think your use counter addresses both of these.

(https://firefox-source-docs.mozilla.org/toolkit/components/telemetry/collection/use-counters.html has documentation for Firefox by the way. #dom:mozilla.org on Matrix should be able to help if you have any trouble.)

lucacasonato commented 2 years ago

Use counter results from Chromium are in: https://chromestatus.com/metrics/feature/timeline/popularity/4152

The % of pages that set a "set-cookie" header on an outbound fetch request hovers around 0.0003%. The data indicates that two popularish sites that set a "set-cookie" header on request headers. These domains are:

For both of these sites the "set-cookie" header is set on an outbound API request that uses fetch. On the osgohome.com site, it is a syntactically invalid set-cookie header.

The proposed changes in #1346 would make "set-cookie" a forbidden header name, thus causing the cookie to be silently ignored (no explicit error is raised).

I have tested both sites using a puppeteer script using request interception that removes all outbound "set-cookie" headers (essentially what #1346 would do). Both sites continue to work just fine, and the API endpoints continue to return 200 status codes, identical data, and identical response headers.

As such I think we can safely make this change.

adrianhopebailie commented 2 years ago

Am I correct that step one of what @lucacasonato proposed in https://github.com/whatwg/fetch/issues/973#issuecomment-1025227167 has now been completed and the impact of special handling set-cookie is now low enough for this to be considered?

Was the conclusion to adopt an implementation like Deno's or to add a getSetCookie() method to the Headers object for this special case?

annevk commented 2 years ago

The latter, see #1346. It's currently stuck on getting implementer interest from two browsers.

FadyMak commented 1 year ago

FWIW, using getAll would maintain consistency with other Web APIs such as URLSearchParams.getAll(). It's also what is being used by Cloudflare Workers as @kentonv mentioned above.

I think getSetCookie is just fine - this comment is more to call out API consistency.

andreubotella commented 1 year ago

FWIW, using getAll would maintain consistency with other Web APIs such as URLSearchParams.getAll(). It's also what is being used by Cloudflare Workers as @kentonv mentioned above.

I think getSetCookie is just fine - this comment is more to call out API consistency.

getAll was removed from the specification because Webkit objected to it, since it would not be feasible for them to implement. As I see it, the only options for bringing getAll back without them objecting would be:

In my opinion, both of these options are bad, and getSetCookie is much better than either.

kentonv commented 1 year ago

FWIW Cloudflare Workers' implementation of getAll() throws an exception if the header name is anything other than Set-Cookie.

However, the ship has clearly sailed here, getSetCookie is the standard that has been written and agreed upon.

med-ab commented 3 months ago

what about X-Robots-Tag for example where rules specified without a user agent are valid for all

console.log(
  new Headers([
    ['x-robots-tag', 'noindex'],
    ['x-robots-tag', 'unavailable_after: 25 Jun 2099 15:00:00'],
    ['x-robots-tag', 'somebot: nofollow'],
    ['x-robots-tag', 'otherbot: noindex, nofollow'],
    ['x-robots-tag', 'noarchive'],
  ]).get('x-robots-tag')
  ==
  'noindex, unavailable_after: 25 Jun 2099 15:00:00, somebot: nofollow, otherbot: noindex, nofollow, noarchive'
)

now is the noarchive rule for all or just for otherbot ?!

annevk commented 3 months ago

If you cannot split on comma, then X-Robots-Tag violates HTTP.

med-ab commented 3 months ago

If you cannot split on comma, then X-Robots-Tag violates HTTP.

where in the HTTP spec does it say that ?