Closed bwbroersma closed 8 months ago
I agree. Relevant reading, perhaps, here: https://en.wikipedia.org/wiki/User-Agent_header#User_agent_spoofing This may mean that less-popular browsers are not sent complex content (even though they might be able to deal with it correctly) or, in extreme cases, refused all content.
If ignoring this:
It's just these two locations: https://github.com/internetstandards/Internet.nl/blob/742676088ac86a4c6017491831ac14e981b26de5/checks/http_client.py#L62 https://github.com/internetstandards/Internet.nl/blob/742676088ac86a4c6017491831ac14e981b26de5/checks/tasks/tls_connection.py#L663
Remaining questions are:
VERSION
is a x.y.z
-format for releases, but PR's and test versions are different
https://github.com/internetstandards/Internet.nl/blob/742676088ac86a4c6017491831ac14e981b26de5/internetnl/settings.py#L627internetnl
altogether?Latest RFC on User-Agent header: https://www.rfc-editor.org/rfc/rfc9110.html#name-user-agent
Question: What User-Agent header are other test tools using?
Tool | User-Agent |
---|---|
W3C Markup Validation Service | W3C_Validator/1.3 http://validator.w3.org/services (IPv6) andValidator.nu/LV http://validator.w3.org/services (IPv4) |
W3C CSS Validation Service | Jigsaw/2.3.0 W3C_CSS_Validator_JFouffa/2.0 (See <http://validator.w3.org/services>) |
SSL Labs - Test SSL | SSL Labs (https://www.ssllabs.com/about/assessment.html) Plus query parameter: ?SSL_Labs_Renegotiation_Test=User_Agent_May_Not_Show |
Security Headers | Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.0.0 Safari/537.36 SecurityHeaders Plus Referer: https://securityheaders.com/ |
Hardenize | Hardenize (https://www.hardenize.com) andMozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.212 Safari/537.36 Hardenize |
Thanks! See also: https://udger.com/resources/ua-list/crawlers
Oh, cool' we're on that list: https://udger.com/resources/ua-list/bot-detail?bot=internetnl#id131933
Priority for this issue is asked by a governmental agency, currently the IPv4/IPv6 compare fails because the User-Agent internetnl/1.0
results in a 401
, which is a failure because of https://github.com/internetstandards/Internet.nl/issues/1226.
Funny thing is, I always use Mozilla/5.0
when sending requests without an User-Agent is blocked, and this magic Mozilla/5.0
also works on this 'hardened' system.
For the record: I'm proposing to put internetnl and the version string in the comment field only.
Decided with @baknu:
settings.py
v1.8.1.dev26-g7426760
(g is pointing to the git-hash, https://github.com/internetstandards/Internet.nl/commit/7426760, dev is the commits ahead, why in this case it's v1.8.1 and not v1.8.3 I don't know)+https://internet.nl/about/
.Note again, internet.nl does not always send a User-Agent, which is a separate bug:
Currently
internetnl/1.0
is used, this is not ideal since it's not a common format plus since docker others can easily spin up their own instance and the UA should reflect at least the correct link to contact the server/person crawling.As mentioned before in https://github.com/internetstandards/Internet.nl/issues/363#issuecomment-1860475407 and https://github.com/internetstandards/Internet.nl/issues/1042#issuecomment-1687697840 I would prefer to change this to a common bot user-agent like also listed in MDN.
The more standardized and accepted User-Agent is
Mozilla/5.0 (compatible; SoftwareName/0.1.2; +https://internet.nl/)
where the last+
part could be the deployed instance (for a protected batch server another public page could be used, plus maybe include some #user-id-token, I've seen monitoring systems that do this). The+
part should be configurable, but could default to the current instance domain variable already used.So I suggest for us:
Mozilla/5.0 (compatible; internetnl/1.8.3; +https://internet.nl/about/)
Ideally we would even setup a 'bot' page likehttp://www.google.com/bot.html
.The RFC 1945 - 10.5 User-Agent is not strict:
3.7 Product Tokens defines:
2.2 Basic Rules defines the comment as:
qdtext = <any CHAR except <"> and CTLs, but including LWS>