webhintio / hint

đź’ˇ A hinting engine for the web
https://webhint.io/
Apache License 2.0
3.63k stars 680 forks source link

Look into how duplicate headers should be handled #750

Closed alrra closed 4 years ago

alrra commented 6 years ago

Some sites send the same headers multiple times, e.g.:

curl -sSIL  https://github.com/
HTTP/1.1 200 OK
Date: Wed, 10 Jan 2018 18:19:31 GMT
Content-Type: text/html; charset=utf-8
Server: GitHub.com
Status: 200 OK
Cache-Control: no-cache
=========> Vary: X-PJAX
X-UA-Compatible: IE=Edge,chrome=1
=========> Set-Cookie: logged_in=no; domain=.github.com; path=/; expires=Sun, 10 Jan 2038 18:19:31 -0000; secure; HttpOnly
=========> Set-Cookie: _gh_sess=eyJzZXNzaW9uX2lkIjoiN2JiNGQ1ZTg3ODViZmE2Mjc2Y2JkMTRhYWUzYTRhYTMiLCJsYXN0X3JlYWRfZnJvbV9yZXBsaWNhcyI6MTUxNTYwODM3MTI4NCwiX2NzcmZfdG9rZW4iOiJKK0FFeFlkUWxaeUJBUkx2bTRVd2d1TkhPcXlUV3MxMzcvM2JaQU5aQ3lnPSJ9--2ca53ddae657465a55abdc3d2b3a3ce9d4994328; path=/; secure; HttpOnly
X-Request-Id: 6d57ba4949da3509b7345e9a9e628e71
X-Runtime: 0.043930
Expect-CT: max-age=2592000, report-uri="https://api.github.com/_private/browser/errors"
Content-Security-Policy: default-src 'none'; base-uri 'self'; block-all-mixed-content; child-src render.githubusercontent.com; connect-src 'self' uploads.github.com status.github.com collector.githubapp.com api.github.com www.google-analytics.com github-cloud.s3.amazonaws.com github-production-repository-file-5c1aeb.s3.amazonaws.com github-production-upload-manifest-file-7fdce7.s3.amazonaws.com github-production-user-asset-6210df.s3.amazonaws.com wss://live.github.com; font-src assets-cdn.github.com; form-action 'self' github.com gist.github.com; frame-ancestors 'none'; img-src 'self' data: assets-cdn.github.com identicons.github.com collector.githubapp.com github-cloud.s3.amazonaws.com *.githubusercontent.com; media-src 'none'; script-src assets-cdn.github.com; style-src 'unsafe-inline' assets-cdn.github.com
Strict-Transport-Security: max-age=31536000; includeSubdomains; preload
X-Content-Type-Options: nosniff
X-Frame-Options: deny
X-XSS-Protection: 1; mode=block
X-Runtime-rack: 0.051689
=========> Vary: Accept-Encoding
X-GitHub-Request-Id: DA7F:E1CC:D2587E:134030F:5A565933
qzhou1607-zz commented 6 years ago

According to https://stackoverflow.com/a/8079893, it is okay for vary, but is not allowed when set-cookie have the same cookie names. What I think that needs to happen:

Does this plan sound good? @alrra

alrra commented 6 years ago

Quickly scanning https://tools.ietf.org/html/draft-ietf-httpbis-p1-messaging-21#section-3.2:

A server MUST wait until the entire header section is received before interpreting a request message, since later header fields might include conditionals, authentication credentials, or deliberately misleading duplicate header fields that would impact request processing.

Multiple header fields with the same field name MUST NOT be sent in a message unless the entire field value for that header field is defined as a comma-separated list [i.e., #(values)]. Multiple header fields with the same field name can be combined into one "field-name: field-value" pair, without changing the semantics of the message, by appending each subsequent field value to the combined field value in order, separated by a comma. The order in which header fields with the same field name are received is therefore significant to the interpretation of the combined field value; a proxy MUST NOT change the order of these field values when forwarding a message.

Note: The "Set-Cookie" header field as implemented in practice can occur multiple times, but does not use the list syntax, and thus cannot be combined into a single line ([RFC6265]). (See Appendix A.2.3 of [Kri2001] for details.) Also note that the Set-Cookie2 header field specified in [RFC2965] does not share this problem.


The compression rule needs to be updated to take into consideration when there are multiple vary headers.

@qzhou1607 I think that should be handled at the connector level which should combine the ones that can be combined into a single one.

What do browser do? If I remember correctly Chrome combines them, but Firefox doesn't (or at least it doesn't do it in the dev tools)?

molant commented 6 years ago

We should also check what happens with the request package. IIRC it was using the latest instance of the header?

luksurious commented 6 years ago

Just to add: When scanning our page we got a lot of errors for incorrect cache headers, because our nginx setup sends two Cache-Control headers with different data (and I couldn't find a way to have it behave differently). However, this appears to be perfectly valid, which sonarwhal/webhint doesn't recognize correctly.

curl -sSIL https://secure.kyto.com/styles/main.css?1532364670                                                                                                  11:16:47
HTTP/1.1 200 OK
Date: Thu, 26 Jul 2018 09:16:51 GMT
Content-Type: text/css
Content-Length: 271092
Last-Modified: Mon, 23 Jul 2018 16:53:51 GMT
Connection: keep-alive
ETag: "5b56081f-422f4"
Expires: Sat, 25 Aug 2018 09:16:51 GMT
> Cache-Control: max-age=2592000
Pragma: public
> Cache-Control: public
Accept-Ranges: bytes

Sonarwhal result:

Static resources should have a long cache value (31536000) and use the immutable directive: max-age=2592000, public

https://secure.kyto.com … tyles/main.css?1531251747:1:20095
alrra commented 6 years ago

@luksurious Thanks for the example!

chrisgraham commented 4 years ago

I'm getting errors like...

hint #1: 'x-content-type-options' header value should be 'nosniff', not 'nosniff nosniff'.

and on another URL...

hint #2: 'x-content-type-options' header value should be 'nosniff', not 'nosniff, nosniff'.

When the same (and correct) x-content-type-options is sent twice. It's really weird that sometimes it is merging them with a space and sometimes also with a comma.

Anyway, it shouldn't be merging them - there should probably be a separate check for repeated headers.

EDIT: Actually merging is standard for HTTP, if it does it correctly. Comma should be used not space, and there should be no duplication.

molant commented 4 years ago

@chrisgraham do you have a URL were you are getting the values separated by a space so we can test this? If you don't want to share it publicly you can DM it via Twitter to webhintio

Thanks!

sarvaje commented 4 years ago

It looks like puppeteer join the duplicated headers in the same property separated with a new line:

for example: for github.com

Vary: X-PJAX
Vary: Accept-Encoding
{
    vary: "X-PJAX\nAccept-Encoding"
}
molant commented 4 years ago

How do we process that on our side? I believe we generate an object for the headers.


From: Jesus David GarcĂ­a Gomez notifications@github.com Sent: Friday, December 6, 2019 14:31 To: webhintio/hint Cc: AntĂłn Molleda; Comment Subject: Re: [webhintio/hint] Look into how duplicate headers should be handled (#750)

It looks like puppeteer join the duplicated headers in the same property separated with a new line:

for example: for github.com

Vary: X-PJAX Vary: Accept-Encoding

{ vary: "X-PJAX\nAccept-Encoding" }

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/webhintio/hint/issues/750?email_source=notifications&email_token=AAEUDARFGPVQIYAEI55ICUTQXLHFTA5CNFSM4ELG7LWKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEGFRHTA#issuecomment-562762700, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AAEUDARYOYTEJU2VEGUQ7VDQXLHFTANCNFSM4ELG7LWA.

chrisgraham commented 4 years ago

My test site was https://compo.sr/

Ok, let me clarify this issue again, as I didn't report it in the best way, and now we know more...

1) Headers are not being combined by webhint according to proper HTTP rules.

They should be comma-separated, and de-duplicated. Apparently currently they are being delimited by line breaks in some situations.

2) Duplicated headers are an issue, but a completely separate one.

Let me also explain why I have duplicated headers. It's because PHP has no way of knowing what headers Apache is writing. So if I implement a header for security in PHP, and also implement it in .htaccess on Apache, it's unfortunately going to come out twice. Why would I do it in both? Well, if I need it for non-PHP files I cannot do it in PHP. However, it's most important for PHP and I can't rely on .htaccess being correctly set or working on the sites of end-users of our CMS.

molant commented 4 years ago
  1. Headers are not being combined by webhint according to proper HTTP rules.

The problem is that that's how puppeteer returns the headers. The response.headers() in the following screenshot is a call to its API and you can see the value in the watch section:

image

Using the jsdom connector I get just nosniff instead of nosniff, nosniff.

image

So it looks like neither do it right.

I think this should be fixed upstream (at least puppeteer, we are using a custom requester for jsdom) so I've created https://github.com/puppeteer/puppeteer/issues/5244. We'll see what the teams has to say 🙂

molant commented 4 years ago

@chrisgraham looks like the \n issue comes from chromium and how the values are serialized in CDP so not very likely to be changed. I've created #3440 to fix it here. Feel free to take a look!