internetstandards / Internet.nl

Internet standards compliance test suite
https://internet.nl
164 stars 36 forks source link

Double content-type in response seems to affect security.txt outcome #1409

Open stitch opened 1 month ago

stitch commented 1 month ago

Security.txt seems to be configured on this domain: https://networking4all.com/.well-known/security.txt https://www.networking4all.com/.well-known/security.txt

The internet.nl test says: "Your web server does not offer a security.txt file in the right location, it could not be retrieved, or its content is not syntactically valid." https://internet.nl/site/networking4all.com/2771318/#control-panel-31

What seems to be the issue is that the response of the host contains "Content-Type" twice. The code seems to expect this once and might no be able to understand this header is set twice. Or the charset bit is not used:

< Content-Type: text/plain < Content-Length: 447 < Last-Modified: Thu, 23 Nov 2023 14:27:04 GMT < Connection: keep-alive < ETag: "655f6138-1bf" < Content-Type: text/plain; charset=utf-8

bwbroersma commented 1 month ago

The tech details are more specific:

Error: Media type in Content-Type header must be 'text/plain'.

Just tested, both (a single) Content-Type with either text/plain or text/plain; charset=utf-8 are okay. Content-Type should of course be set once, but normally only the first header should be considered, I think.

_I wish I would easily be able to replicate it with my nginx magic, but setting a double Content-Type header is not even possible with njs, since only the last one set is used._

bwbroersma commented 1 month ago

Double checked, and double Content-Type seems not to be allowed, so I think the current behavior, which indicates something is wrong with the Content-Type in the tech details, is good enough?

Of course the code could check and warn/inform about invalid headers in general, but that probably is a pretty deep rabbit hole to go into.


A double Content-Type seems invalid, since RFC 7231 § 3.1.1.5 defines:

Content-Type = media-type

and RFC 7230 page 24 states:

A sender MUST NOT generate multiple header fields with the same field name in a message unless either the entire field value for that header field is defined as a comma-separated list [i.e., #(values)] or the header field is a well-known exception (as noted below).

The only well-known is Set-Cookie.


How the internet.nl code works: https://github.com/internetstandards/Internet.nl/blob/fca192a153dfd6697fa2966cee11fdc5baf029d3/checks/tasks/securitytxt.py#L104-L105 where media_type is the result of parse_headers from cgi https://github.com/internetstandards/Internet.nl/blob/fca192a153dfd6697fa2966cee11fdc5baf029d3/checks/tasks/securitytxt.py#L80-L81 which is the content-type header from requests: https://github.com/internetstandards/Internet.nl/blob/fca192a153dfd6697fa2966cee11fdc5baf029d3/checks/tasks/securitytxt.py#L60


To check the response of Requests:

from cgi import parse_header
import requests
r = requests.get('https://networking4all.com/.well-known/security.txt')
parse_header(r.headers['content-type'])
('text/plain, text/plain', {'charset': 'utf-8'})