Bug report

Bug description:

import urllib.parse

# prints "abcxyz.test"
print(urllib.parse.urlsplit("http://abc\txyz.test/").netloc)

Current urlsplit is implemented according to this spec:

https://url.spec.whatwg.org/#concept-basic-url-parser

The spec does say in item 3 to strip tabs, but I believe there's a bug in the specification (perhaps they wanted to say leading/trailing whitespace) because the item 7 in host parsing says

If asciiDomain contains a forbidden domain code point, domain-invalid-code-point validation error, return failure.

, and tab is listed as a "forbidden domain code point". If tabs are stripped from the entire input before any other work is done, checking for tabs in host names wouldn't make much sense.

I created a bug in the specification project, so maybe they will provide some guidance later on.

https://github.com/whatwg/url/issues/829

CPython versions tested on:

3.10

Operating systems tested on:

Linux, Windows

python / cpython

urlsplit manufactures hostnames because it strips off tabs before validating them #122761

Bug report

Bug description:

CPython versions tested on:

Operating systems tested on: