jeremykendall / php-domain-parser

Public Suffix List based domain parsing implemented in PHP
MIT License
1.16k stars 128 forks source link

Some questions, confusions #283

Closed umairkhan-dev closed 4 years ago

umairkhan-dev commented 4 years ago

First of all, please pardon my ignorance. I wanted to post this in Rewrite the documentation #275 as comment but thought to create a new post / issue.

Some suggestions.

1. Kindly do briefly explain in documentation

2. If I search some domain with TLD that does not exists in PSL, the parser still parse the domain and even publicSuffix index / key is populated. Is there a way to tell the parser to throw exception or return a null Pdp\Domain if TLD is not part of PSL (both ICANN and PRIVATE) in given domain.

3. Please provide a single concise full example consisting of major parts of this parser. Like you insists upon setting some scenario to update PSL (or RZD) cache. Here is an example of my test usage.

require_once(__DIR__ . DIRECTORY_SEPARATOR . 'vendor/autoload.php');
try {
    $manager = new \Pdp\Manager(new \Pdp\Cache(), new \Pdp\CurlHttpClient(), new DateInterval('P1D')); // Using Manager to make, refresh, update cache. new DateInterval('P1D') defines that cache is updated daily day.
    $rules = $manager->getRules(); //  Rules are fetched from cache and cache files are updated if needed as per DateInterval provide in Manager object creation.
    $domain = $rules->resolve('www.abc.enfield.sch.uk'); // Resolve provided domain and returns Pdp\Domain object.
    print_r($domain);
    var_dump($domain);
    var_dump($domain->getPublicSuffix()); // Returns PublicSuffix part of provided domain as string.
    // $ps = \Pdp\PublicSuffix::createFromDomain($domain); // Make and return Pdp\PublicSuffix object in reference to Pdp\Domain object.
    $ps = $rules->getPublicSuffix($domain); // Make and return Pdp\PublicSuffix object in reference to Pdp\Domain object.
    print_r($ps);
    var_dump($ps);
} catch (Exception $e) {
    var_dump($e);
}

Environment.

  1. Ubuntu 18.04.
  2. LAMP with PHP 7.4
  3. jeremykendall/php-domain-parser 5.7.0 through Composer

I hope that I am able to explain my points properly.

Also I have a question that is not related to this Parser but as you have more knowledge of PSL and TLDs. Is there a way I can check which TLD is available for public registration and which are not? E.g. A normal person can register a .com domain but not .google because .google does not allow public registration. Some kind of list like PSL defining which TLDs are available for public registration.

nyamsprod commented 4 years ago

@umairkhan-dev thanks for the input. I would strongly advise you put this infos under #275 because indeed the rewrite aim at resolving these confusion and in the next major release I'm trying to resolve them at the API level. Those concept will be easier to grasp and to understand.

Currently the resolve because it follows PSL recommendation will never fail but if you want some exception throwing I would suggest using the get(Private|ICANN|Cookie)Domain methods which throw on error.

umairkhan-dev commented 4 years ago

Thank you for your quick response.

Currently the resolve because it follows PSL recommendation will never fail but if you want some exception throwing I would suggest using the get(Private|ICANN|Cookie)Domain methods which throw on error.

Currently getPrivateDomain and getICANNDomain methods search in their specific sections of PSL. May I suggest a method like getPSLDomain that strictly search only in PSL but in both (ICANN and Private) sections and return Pdp\Domain on success or throws exception on error.

Also, you have not said any thing about my last unrelated (to Parser) question about checking Public Registration availability. Kindly respond to this also.

nyamsprod commented 4 years ago

Currently getPrivateDomain and getICANNDomain methods search in their specific sections of PSL. May I suggest a method like getPSLDomain that strictly search only in PSL but in both (ICANN and Private) sections and return Pdp\Domain on success or throws exception on error.

getCookieDomain is what you mean by getPSLDomain I think.

nyamsprod commented 4 years ago
  1. Please provide a single concise full example consisting of major parts of this parser. Like you insists upon setting some scenario to update PSL (or RZD) cache. Here is an example of my test usage.

There's no example because the manager is optional you could achieve this using multiple strategy. Currently, in v6, the manager is moved and its functionality is in a separate namespace for that reason.

umairkhan-dev commented 4 years ago

Lets say we have a domain paki.appri. appri is not in PSL.

Code:

$str = 'paki.appri';
$manager = new \Pdp\Manager(new \Pdp\Cache, new \Pdp\CurlHttpClient, new DateInterval('P1D'));
$rules = $manager->getRules();
echo "getCookieDomain\n\n";
$domain = $rules->getCookieDomain($str);
print_r($domain);
var_dump($domain);
echo "\n\n\ngetICANNDomain\n\n";
$domain = $rules->getICANNDomain($str);
print_r($domain);
var_dump($domain);
echo "\n\n\ngetPrivateDomain\n\n";
$domain = $rules->getPrivateDomain($str);
print_r($domain);
var_dump($domain);

Output:

getCookieDomain

Pdp\Domain Object
(
    [domain] => paki.appri
    [registrableDomain] => paki.appri
    [subDomain] => 
    [publicSuffix] => appri
    [isKnown] => 
    [isICANN] => 
    [isPrivate] => 
)
/var/www/what.domains/public_html/tld-admin/tests.php:38:
object(Pdp\Domain)[104]
  private 'domain' => string 'paki.appri' (length=10)
  private 'labels' => 
    array (size=2)
      0 => string 'appri' (length=5)
      1 => string 'paki' (length=4)
  private 'publicSuffix' => 
    object(Pdp\PublicSuffix)[109]
      private 'publicSuffix' => string 'appri' (length=5)
      private 'section' => string '' (length=0)
      private 'labels' => 
        array (size=1)
          0 => string 'appri' (length=5)
      private 'asciiIDNAOption' => int 0
      private 'unicodeIDNAOption' => int 0
      private 'isTransitionalDifferent' => boolean false
  private 'registrableDomain' => string 'paki.appri' (length=10)
  private 'subDomain' => null
  private 'asciiIDNAOption' => int 0
  private 'unicodeIDNAOption' => int 0
  private 'isTransitionalDifferent' => boolean false

getICANNDomain

Pdp\Domain Object
(
    [domain] => paki.appri
    [registrableDomain] => paki.appri
    [subDomain] => 
    [publicSuffix] => appri
    [isKnown] => 
    [isICANN] => 
    [isPrivate] => 
)
/var/www/what.domains/public_html/tld-admin/tests.php:42:
object(Pdp\Domain)[106]
  private 'domain' => string 'paki.appri' (length=10)
  private 'labels' => 
    array (size=2)
      0 => string 'appri' (length=5)
      1 => string 'paki' (length=4)
  private 'publicSuffix' => 
    object(Pdp\PublicSuffix)[107]
      private 'publicSuffix' => string 'appri' (length=5)
      private 'section' => string '' (length=0)
      private 'labels' => 
        array (size=1)
          0 => string 'appri' (length=5)
      private 'asciiIDNAOption' => int 0
      private 'unicodeIDNAOption' => int 0
      private 'isTransitionalDifferent' => boolean false
  private 'registrableDomain' => string 'paki.appri' (length=10)
  private 'subDomain' => null
  private 'asciiIDNAOption' => int 0
  private 'unicodeIDNAOption' => int 0
  private 'isTransitionalDifferent' => boolean false

getPrivateDomain

Pdp\Domain Object
(
    [domain] => paki.appri
    [registrableDomain] => paki.appri
    [subDomain] => 
    [publicSuffix] => appri
    [isKnown] => 
    [isICANN] => 
    [isPrivate] => 
)
/var/www/what.domains/public_html/tld-admin/tests.php:46:
object(Pdp\Domain)[102]
  private 'domain' => string 'paki.appri' (length=10)
  private 'labels' => 
    array (size=2)
      0 => string 'appri' (length=5)
      1 => string 'paki' (length=4)
  private 'publicSuffix' => 
    object(Pdp\PublicSuffix)[108]
      private 'publicSuffix' => string 'appri' (length=5)
      private 'section' => string '' (length=0)
      private 'labels' => 
        array (size=1)
          0 => string 'appri' (length=5)
      private 'asciiIDNAOption' => int 0
      private 'unicodeIDNAOption' => int 0
      private 'isTransitionalDifferent' => boolean false
  private 'registrableDomain' => string 'paki.appri' (length=10)
  private 'subDomain' => null
  private 'asciiIDNAOption' => int 0
  private 'unicodeIDNAOption' => int 0
  private 'isTransitionalDifferent' => boolean false

All three functions get(Private|ICANN|Cookie)Domain parses the domain and no error was reported / thrown.

When writing this code I was hoping that at least get(Private|ICANN)Domain would report error as appri is not part of PSL list. I think this should be considered as bug as in case of getICANNDomain I asked parser to parse domain paki.appri as per ICANN section of PSL i.e appri exists in ICANN section of PSL, same goes for getPrivateDomain.

As for

getCookieDomain is what you mean by getPSLDomain I think.

I think I can not clarify my point. Let me try again. Same example, paki.appri. From what I understand get(Private|ICANN|Cookie)Domain should behave like

  1. With getCookieDomain: parses the domain as per PSL rules, regardless appri itself exists in PSL list, returns \Pdp\Domain object or throws error / exception.
  2. With getICANNDomain: parses the domain as per PSL rules and checks appri itself exists in ICANN section of PSL list, returns \Pdp\Domain object or throws error / exception.
  3. With getPrivateDomain: parses the domain as per PSL rules and checks appri itself exists in Private section of PSL list, returns \Pdp\Domain object or throws error / exception.

If I consider

getCookieDomain is what you mean by getPSLDomain I think.

the number 1 point may be

  1. With getCookieDomain: parses the domain as per PSL rules and checks appri itself exists in ICANN section or Private section of PSL list, returns \Pdp\Domain object or throws error / exception.

However in above scenario, all get(Private|ICANN|Cookie)Domain functions did not throw error.

nyamsprod commented 4 years ago

This is an easy one if you read the documentation:

Domain::isKnown(); // tells the domain's public suffix is a known PS
Domain::isCANN(); // tells the domain's public suffix is a ICANN PS
Domain::isPrivate(); // tells the domain's public suffix is a Private PS

of course the information is dependent of the Public Suffix List used.

So even if it always returns something because of the PSL algorithm the status method just give you the correct answer.

The get(Private|ICANN|Cookie)Domain only assure that you do not have a null domain object.

Again this is being revisited in v6 for, I hope, better clarity.

//v5

$rules->getCookieDomain($domain)->isKnown();

//v6

$rules->getCookieDomain($domain)->getPublicSuffix()->isKnown();

In v6 the public suffix status is ... on the public suffix object and no longer on the domain object. Because a Public suffix could not be determined does not mean that the domain is not valid. These are two different status hence, throwing should IMHO only occurs if you known the domain to be bogus not because the domain or a PSL does not exists yet 😉

Last but not least:

umairkhan-dev commented 4 years ago

Sorry @nyamsprod for late reply. Yes, I have read the documentation. And I know about isKnown, isICANN and isPrivate both as methods and Pdp\Domain object's properties. However the explanation in documentation about get(Private|ICANN|Cookie)Domain, especially

WARNING: If the Domain can not be resolved an exception is thrown.

made me confused about their actual working i.e as mentioned above

From what I understand get(Private|ICANN|Cookie)Domain should behave like

  1. With getCookieDomain: parses the domain as per PSL rules, regardless appri itself exists in PSL list, returns \Pdp\Domain object or throws error / exception.
  2. With getICANNDomain: parses the domain as per PSL rules and checks appri itself exists in ICANN section of PSL list, returns \Pdp\Domain object or throws error / exception.
  3. With getPrivateDomain: parses the domain as per PSL rules and checks appri itself exists in Private section of PSL list, returns \Pdp\Domain object or throws error / exception. If I consider

getCookieDomain is what you mean by getPSLDomain I think.

the number 1 point may be

  1. With getCookieDomain: parses the domain as per PSL rules and checks appri itself exists in ICANN section or Private section of PSL list, returns \Pdp\Domain object or throws error / exception.

I am very much obliged and thankful that you explained it to me and considered some changes in documentation.

I also request not to close this issue until rewriting of documentation.

nyamsprod commented 4 years ago

@umairkhan-dev I can pinned the issue so it can easily be seen but if the explanations given are enough I'll close it after editing my last comment for the typo 😉 Also keep in mind that since I've started working on v6 I may rewrite the documentation for v5 and v6. which should improve the whole thing. But I still need to figure out some thing prior to that as it may take much more time.

umairkhan-dev commented 4 years ago

@nyamsprod. Yes, the explanation was more than enough. Thank you very much.

As for closing this issue and timing of rewriting the documentation, its up to you. Please do as you see fit. 😄

umairkhan-dev commented 4 years ago

I had previously requested that there are typos in your comment.

Domain::isKnown(); // tells the domain's public suffix is a known PS
Domain::isCANN(); // tells the domain's public suffix is a ICANN PS
Domain::isCANN(); // tells the domain's public suffix is a Prviate PS
nyamsprod commented 4 years ago

@umairkhan-dev should be fixed ... thanks for the remainder 😉