Open Xerkus opened 2 months ago
@laminas/technical-steering-committee this one is tricky. Needs feedback.
Interop integration tests do not establish how any of this should be treated. Other implementations allow any crap to be set as host and as a result produce junk url.
php > var_export(parse_url('http://(:1'));
array (
'scheme' => 'http',
'host' => '(',
'port' => 1,
)
php > var_export(parse_url('http://,:1'));
array (
'scheme' => 'http',
'host' => ',',
'port' => 1,
)
php > var_export(parse_url('http://[:1'));
array (
'scheme' => 'http',
'host' => '[',
'port' => 1,
)
php > var_export(parse_url('http://]:1'));
array (
'scheme' => 'http',
'host' => ']',
'port' => 1,
)
php > var_export(parse_url('http://][:1'));
array (
'scheme' => 'http',
'host' => '][',
'port' => 1,
)
php > var_export(parse_url('http://;:1'));
array (
'scheme' => 'http',
'host' => ';',
'port' => 1,
)
php > var_export(parse_url('http://::1'));
array (
'scheme' => 'http',
'host' => ':',
'port' => 1,
)
php > var_export(parse_url('http:// :1'));
array (
'scheme' => 'http',
'host' => ' ',
'port' => 1,
)
php > var_export(parse_url('http:// '));
array (
'scheme' => 'http',
'host' => ' ',
)
url_parse handles malformed IPv6 that is not enclosed in brackets but the last segment is treated as a port.
Malformed: https://user:pass@fe80::200:5aee:feaa:20a2:3001/foo?bar=baz#quz
Valid: https://user:pass@[fe80::200:5aee:feaa:20a2]:3001/foo?bar=baz#quz
Since this malformed format is accepted I feel we should allow to set IPv6 address as a host using withHost()
but enclose it in brackets ourselves.
Alternative would be to reject IPv6 address that is not enclosed with brackets in all cases. It will be safer and frankly the proper approach but it also can be considered a breaking change.
For the remaining filtering introduced in this PR it only explicitly forbids general delimeters but allows anything else to run rampant.
I believe validation needs to be clamped down to only allow IPv6, IPv4 and valid for DNS hostnames. Probably pass it through validate domain filter. Any other URI variants are not suitable for http message context. Something along those lines:
filter_var(idn_to_ascii($host), FILTER_VALIDATE_DOMAIN, FILTER_FLAG_HOSTNAME)
Description
This change introduces validation for the uri host as
parse_url()
does not validate host subcomponent permitting outright forbidden characters to be used.This PR introduces basic host subcomponent validation that would reject forbidden characters, validate IPv6 address and wrap it the square brackets. This validation does not ensure hostname is actually valid. For example, URI allows various sub-delimeters and percent encoding which would be invalid in the context of the http urls. This validation does not prevent them and it does not validate percent encoding is actually valid.
See RFC3986 3.2.2
Port validation was added to the Uri constructor as
parse_url()
would allow port 0 which is a wildcard port not valid for the Uri.