jeremykendall / php-domain-parser

Public Suffix List based domain parsing implemented in PHP
MIT License
1.16k stars 128 forks source link

Issue when parsing host #329

Closed mzurawek closed 2 years ago

mzurawek commented 2 years ago

Example code:

$url = "https://23+year-old+Business+Broker+Tommy+from+La+Prairie,+loves+to+spend+some+time+beach+tanning,+and+rc+model+aircrafts.+Gets+a+lot+of+inspiration+from+life+by+touring+places+such+as+Heritage+of+Mercury.+Almadén+and+Idrija+https/";
$host = parse_url($url, PHP_URL_HOST);
// $host is now 23+year-old+Business+Broker+Tommy+from+La+Prairie,+loves+to+spend+some+time+beach+tanning,+and+rc+model+aircrafts.+Gets+a+lot+of+inspiration+from+life+by+touring+places+such+as+Heritage+of+Mercury.+Almadén+and+Idrija+https

Rules::fromString($listContent)->getICANNDomain($host)throws

[TypeError]
WARNING   [php]  Warning: Undefined array key "result" ["exception" => ErrorException { …}]
WARNING   [php] Warning: Undefined array key "isTransitionalDifferent" ["exception" => ErrorException { …}]
WARNING   [php] Warning: Undefined array key "errors" ["exception" => ErrorException { …}]
  Pdp\IdnaInfo::__construct(): Argument #1 ($result) must be of type string, null given, called in /project/vendor/jeremykendall/php-domain-parser/src/IdnaInfo.php on line 62

Since PHP 8.1.7 parse_url returns this as a valid host I would expect php-domain-parser to somehow signal it cannot process this host. Or am I doing something wrong?

nyamsprod commented 2 years ago

@mzurawek thanks for using the library.

The host you are trying to verify is invalid against DNS host the label is too long there's a limit of 63 characters and I won't even mention your use of UTF-8 characters in there.

Do remember that parse_url is only a parser it does not validate the return value which means that parse_url can sometimes like in your case return falsy responses.

If the host information is of high importance in your system I would strongly suggest using a proper URI validator tools and after validating the full URI use the current package.

To me this is not a bug from the package but rather in parse_url at least as far as I can see and reproduce the issue. Maybe you should raise it as an error on PHP bugs channel 🤔

mzurawek commented 2 years ago

Thank you for suggestion. I've switched from parse_url to league/uri-parser package and it catches that exception.