jeremykendall / php-domain-parser

Public Suffix List based domain parsing implemented in PHP
MIT License
1.16k stars 128 forks source link

Rules::resolve does not throw on invalid domain #240

Closed ishankaru closed 5 years ago

ishankaru commented 5 years ago

Issue summary

Doing a rules resolve on domain 'fbsbx.com' returns empty data. $domain = $rules->resolve('fbsbx.com');

Looking into the issue there seems to be a fatal error dumping $e from the rules resolve function. $xdebug_message: Pdp\Exception\CouldNotResolvePublicSuffix: The public suffix fbsbx.com can not be equal to the domain name fbsbx.com in J:\psl\vendor\jeremykendall\php-domain-parser\src\Domain.php on line 133

Exact code tested below:

<?php
require 'vendor/autoload.php';

use Pdp\Cache;
use Pdp\CurlHttpClient;
use Pdp\Manager;
use Pdp\Rules;

$manager = new Manager(new Cache(), new CurlHttpClient());
$rules = $manager->getRules(); //$rules is a Pdp\Rules object

$domain = $rules->resolve('fbsbx.com'); 
echo "Content: {$domain->getContent()}\n";                      // expected return 'fbsbx.com'
echo "PublicSuffix: {$domain->getPublicSuffix()}\n";            // expected return 'com'
echo "RegistrableDomain: {$domain->getRegistrableDomain()}\n";  // expected return 'fbsbx.com'
echo "SubDomain: {$domain->getSubDomain()}\n";                  // expected return ''
echo "Resolvable: {$domain->isResolvable()}\n";                 // expected return true
echo "Known: {$domain->isKnown()}\n";                           // expected return true
echo "ICANN: {$domain->isICANN()}\n";                           // expected return true
echo "Private: {$domain->isPrivate()}\n";                       // expected return false

Appears to work if used as, but otherwise seems like a bug: $rules->resolve('fbsbx.com', Rules::ICANN_DOMAINS )

System informations

Information Description
Pdp version 5.4.0
PHP version PHP 7.2.11 (cli)
OS Platform Windows 10

Standalone code, or other way to reproduce the problem

(Please complete the text below to help us fix the issue)

Expected result

j:\psl>php pslcheck.php Content: fbsbx.com PublicSuffix: com RegistrableDomain: fbsbx.com SubDomain: Resolvable: true Known: true ICANN: true Private: false expected

Actual result

j:\psl>php pslcheck.php Content: PublicSuffix: RegistrableDomain: SubDomain: Resolvable: Known: ICANN: Private: actual

nyamsprod commented 5 years ago

@ishankaru this you made sure that the intl extension exists and is accessible in your PHP cli version 🤔

ishankaru commented 5 years ago

Yes the intl extension is available, PDP is working good, I have tested many domains including unicode etc, but this domain name seems to have an issue.

image

php -i from CLI: image

There is an entry apps.fbsbx.com in https://publicsuffix.org/list/public_suffix_list.dat which I feel is causing this bug in PDP. image

nyamsprod commented 5 years ago

@ishankaru OK I've tested and I understand why. If you change your code to

use Pdp\Cache;
use Pdp\CurlHttpClient;
use Pdp\Manager;

$manager = new Manager(new Cache(), new CurlHttpClient());
$rules = $manager->getRules();
$rules->getPublicSuffix('fbsbx.com');

You'll get a Exception which is expected :tada:

Now for the explication

Rules::resolve unfortunately does not throw but returns a null Domain on error while Rules::getPublicSuffix does throw.

The behavior is expected if we are to follow how other Public Suffix Resolver works in other languages :cry: . Refer to the following test suite https://github.com/jeremykendall/php-domain-parser/blob/develop/tests/RulesTest.php#L515

So in your example you should do the following instead:

use Pdp\Cache;
use Pdp\CurlHttpClient;
use Pdp\Manager;

$manager = new Manager(new Cache(), new CurlHttpClient());
$rules = $manager->getRules();
$domain = $rules->resolve('fbsbx.com');
if (null === $domain->getContent()) {
    // something wrong/fishy just happened
}

I will update the documentation and pinned you issue so that other may be aware of this. Also it now reminds me why I did introduce the Rules::getPublicSuffix method so that it could throw in such situation :+1: .

ishankaru commented 5 years ago

@nyamsprod thank you for looking at this so quickly and providing feedback, I will add something to handle this in my code as you explained.

Shardj commented 5 years ago

@nyamsprod I'm still looking for an explanation as to why the urls I provided in the linked ticket aren't considered valid domains by this library when they're perfectly valid domains in the real world. I understand you closed the issue because you were frustrated by my questions but it really wasn't helpful. The algorithm provided on the PSL website resolves them without issue if I follow it through, and online url parsers have no issue (https://www.freeformatter.com/url-parser-query-string-splitter.html, https://www.beautifyconverter.com/url-parser.php).

nyamsprod commented 5 years ago

@Shardj I've responded to you in full in the issue you've reported and I tried to be as open and complete as I could. Hijacking someone else resolved issue is not cool. Please refrain for doing so. If you find my answers not compelling you are still free to make a PR or fork the library to make the code more inline with your business expectation.

Shardj commented 5 years ago

@nyamsprod true, it isn't cool and I don't like doing it. But isn't cool to lock my issue either so I wasn't sure where else to write. I left it be for a while as we already have a custom solution in place but we're back at trying to replace it so I thought I'd check back. I'd have no problem with creating a PR or a fork to meet my own expectations but that isn't the problem. I simply want to make sure my understanding is correct and from repeated re-reading of the locked issue neither myself or other members of my team who looked at the thread understand several things you've said. It seemingly doesn't follow the functionality described on the PSL website for checking the suffix is valid. We can't find any evidence that "A public suffix can not be resolved as explain on the public suffix website." as the PSL site algorithm does resolve this for the three example urls I provided. Since you've taken issue with me responding here though I'll leave it be and push forward with what I already have.