Closed Sebbo94BY closed 1 year ago
After thinking about this a bit further, since the context of Helper/Uri
is just generic URI parsing, I think both the checkPort()
and checkHost()
logic should only be checking for valid segments as per RFC 3986 - 3.2.2 & 3.2.3 and not necissarily host resolution to a valid IP (which might be a better fit in the Adapter or Transport classes)
Note the following from the Host
section:
This specification does not mandate a particular registered name lookup technology and therefore does not restrict the syntax of reg- name beyond what is necessary for interoperability. Instead, it delegates the issue of registered name syntax conformance to the operating system of each application performing URI resolution, and that operating system decides what it will allow for the purpose of host identification.
So let's remove the gethostbynamel()
call and instead check if the HOST segment is one of:
filter_var()
with FILTER_VALIDATE_URL
knowing that it has shortcomings (e.g. doesn't seem to support IDN/unicode either...) Any suggestions/alternative solutions welcome!Edit: And, the checks for checkPort()
seem reasonable, so let's keep those for the time being unless we find a case that is less specific.
New changes look fantastic! Really appreciate the extra effort on this one; sorry to be a pain and thank you for your patience/work in these updates!
Ok, I've asked Google, the PHP manual a bit and tested a few things. The working result is now available here.
Since we are checking here $this->host
, this will never have any URI schema. So the FILTER_VALIDATE_URL
will always fail as this requires the URI schema.
I've seen, that there is also a FILTER_VALIDATE_DOMAIN
, which supports the FILTER_FLAG_HOSTNAME
. Would be the perfect solution for our use-case, but... This filter also reports any IP address as "ok" as long as it does not exceed the domains maximum string length - even when you test with invalid IP addresses.
Below a few examples.
Valid:
php > var_dump( filter_var("localhost", FILTER_VALIDATE_DOMAIN, FILTER_FLAG_HOSTNAME) );
php shell code:1:
string(9) "localhost"
Valid:
php > var_dump( filter_var("127.0.0.1", FILTER_VALIDATE_DOMAIN, FILTER_FLAG_HOSTNAME) );
php shell code:1:
string(9) "127.0.0.1"
Invalid:
php > var_dump( filter_var("127.0.0.300", FILTER_VALIDATE_DOMAIN, FILTER_FLAG_HOSTNAME) );
php shell code:1:
string(11) "127.0.0.300"
So I've decided to use a regex check instead. The actual regex check unfortunately also matches IPv4 addresses, so we need to exclude them with a different regex. 😅
sorry to be a pain and thank you for your patience/work in these updates!
You're welcome!
Ah, no problem. I have such discussions more or less every day at work. Such discussions help in getting the best implementation / result. Nobody knows everything or the best implementation - there is always a better solution. :)
So I've decided to use a regex check instead. The actual regex check unfortunately also matches IPv4 addresses, so we need to exclude them with a different regex.
This is the unforunate thing with trying to validate host
segment under RFC 3986: while 127.0.0.300
is not a valid IPv4, it is indeed a valid hostname (and possibly domain name, but obviously non-ICANN allowed with .300
not being a valid gTLD)
Validation always seems to be a delicate line between what can be allowed and what should be allowed depending on app, context, etc. In any case, I don't want to waste more time on this and we can always revise at a later time if we have reports of issues which I would not expect.