web-platform-tests / wpt

Test suites for Web platform specs — including WHATWG, W3C, and others
https://web-platform-tests.org/
Other
4.79k stars 2.99k forks source link

Consider removing url test containing utf16 surrogates #46941

Open Wuelle opened 4 days ago

Wuelle commented 4 days ago

urltestdata.json contains the following test:

{
    "input": "http://example.com/\uD800\uD801\uDFFE\uDFFF\uFDD0\uFDCF\uFDEF\uFDF0\uFFFE\uFFFF?\uD800\uD801\uDFFE\uDFFF\uFDD0\uFDCF\uFDEF\uFDF0\uFFFE\uFFFF",
    "base": null,
    "href": "http://example.com/%EF%BF%BD%F0%90%9F%BE%EF%BF%BD%EF%B7%90%EF%B7%8F%EF%B7%AF%EF%B7%B0%EF%BF%BE%EF%BF%BF?%EF%BF%BD%F0%90%9F%BE%EF%BF%BD%EF%B7%90%EF%B7%8F%EF%B7%AF%EF%B7%B0%EF%BF%BE%EF%BF%BF",
    "origin": "http://example.com",
    "protocol": "http:",
    "username": "",
    "password": "",
    "host": "example.com",
    "hostname": "example.com",
    "port": "",
    "pathname": "/%EF%BF%BD%F0%90%9F%BE%EF%BF%BD%EF%B7%90%EF%B7%8F%EF%B7%AF%EF%B7%B0%EF%BF%BE%EF%BF%BF",
    "search": "?%EF%BF%BD%F0%90%9F%BE%EF%BF%BD%EF%B7%90%EF%B7%8F%EF%B7%AF%EF%B7%B0%EF%BF%BE%EF%BF%BF",
    "hash": ""
  }

The interesting part here is the \uD801\uDFFE - that's a UTF-16 surrogate.

The behaviour in this case is undefined as per the URL specification^5, where the input to the url parsing algorithm is a scalar value string^2 (meaning a string containing neither leading nor trailing surrogate characters).

url/README.md states:

resources/urltestdata.json contains URL parsing tests suitable for any URL parser implementation.

Therefore, the suite should only test behaviour defined in the url specification^4.

I would like to hear the thoughts of more qualified people on this before I make a PR for it (: