whatwg / url

URL Standard
https://url.spec.whatwg.org/
Other
527 stars 137 forks source link

An opaque-host parser and percent encoding #806

Closed hayatoito closed 9 months ago

hayatoito commented 9 months ago

It seems WPT URL tests have the following url test data:

  {
    "input": "sc://%/",
    "base": null,
    "href": "sc://%/",
    "protocol": "sc:",
    "username": "",
    "password": "",
    "host": "%",
    "hostname": "%",
    "port": "",
    "pathname": "/",
    "search": "",
    "hash": ""
  },

  {
    "input": "foo://!\"$%&'()*+,-.;=_`{}~/",
    "base": null,
    "hash": "",
    "host": "!\"$%&'()*+,-.;=_`{}~",
    "hostname": "!\"$%&'()*+,-.;=_`{}~",
    "href":"foo://!\"$%&'()*+,-.;=_`{}~/",
    "origin": "null",
    "password": "",
    "pathname": "/",
    "port":"",
    "protocol": "foo:",
    "search": "",
    "username": ""
  },

It appears the WPT URL tests assume either "foo://!"$%&'()*+,-.;=_`{}~/" or "sc://%" is a valid URL.

However, a step 3 in opaque-host parser says:

If input contains a U+0025 (%) and the two code points following it are not ASCII hex digits, invalid-URL-unit validation error.

According to this definition, "foo://!"$%&'()*+,-.;=_`{}~/" seems invalid because two code points "&'", which are not ASCII hex digits, follow U+0025 (%). "sc://%" is probably invalid too, though I'm unsure.

Is "step 3" an intended behavior?

The context: I've found this while supporting non-special URLs in chromium (crbug.com/1416006).

annevk commented 9 months ago

Note that the opaque-host parser doesn't return on that line. It just signifies there's a validation error. Not all parsers report validation errors and it would be non-conforming to halt parsing there (unless I suppose the parser was specifically configured to halt on validation errors, but that's not a web-exposed entry point).

hayatoito commented 9 months ago

Thank for letting me know that!

I understand that invalid-url-unit is not marked as failure. I forgot to look that. My bad.

Let me close this issue.