jeremykendall / php-domain-parser

Public Suffix List based domain parsing implemented in PHP
MIT License
1.16k stars 128 forks source link

Blank results when resolving name that exists in public suffix list #260

Closed McAnix closed 4 years ago

McAnix commented 4 years ago

Issue summary

Resolving a domain like 'co.za' or 'linode.com' results in a blank domain object, these both exist in some form or other on the public suffix list. However these are perfectly valid domain names and have websites attached. Adjusting the rules parameter "section" to ICANN_DOMAINS or PRIVATE_DOMAINS works but only for 'linode.com' and 'co.za' respectively.

System informations

Information Description
Pdp version 5.6.0
PHP version PHP 7.2.18-1+ubuntu16.04.1+deb.sury.org+1 (cli) (built: May 3 2019 09:23:41) ( NTS )
OS Platform Ubuntu 16.04.1 LTS

Standalone code, or other way to reproduce the problem

use Pdp\Cache; use Pdp\CurlHttpClient; use Pdp\Manager; use Pdp\Rules;

$manager = new Manager(new Cache(), new CurlHttpClient()); $rules = $manager->getRules() ->withAsciiIDNAOption(IDNA_NONTRANSITIONAL_TO_ASCII) ->withUnicodeIDNAOption(IDNA_NONTRANSITIONAL_TO_UNICODE); $domain = $rules->resolve('linode.com'); echo 'Domain: \'' . $domain->getContent() . '\'';

Expected result

Domain: 'linode.com'

Actual result

Domain: ''

nyamsprod commented 4 years ago

the following snippet should resolve your missunderstanding of how the library works

<?php

$manager = new Manager(new Cache(), new CurlHttpClient());
$rules = $manager->getRules();
echo $rules->resolve('toto.linode.com', Rules::PRIVATE_DOMAINS)->getPublicSuffix(), PHP_EOL;
echo $rules->resolve('toto.linode.com', Rules::ICANN_DOMAINS)->getPublicSuffix(), PHP_EOL;
echo $rules->resolve('toto.linode.com')->getPublicSuffix(), PHP_EOL;

the result is the following:

linode.com
com
linode.com

the explanation is simple.

linode.com is registered as a private domain. By default, the library uses the longest public suffix it can find for a given domain name. In your example, the longest public suffix is the domain itself hence it returns the empty string as stated in the documentation. Depending on your business logic you should explicit against which section you want your domain name to be resolved against. Both results are correct you just need to adjust your requirement to distinguish which is fine for your use case.

Also this question is already answered in the pinned issue #240

McAnix commented 4 years ago

I understand the intricacies of private vs ICANN TLDs. However this doesn't solve the fundamental user problem.

The user is asking the domain parser for the domain parts. The user is not aware of which company recently added its domain to an unofficial TLD suffix list. The user just wants to check whois for the registrable domain, or see which registry (TLD) the domain is registered under.

Why is it now the developer's problem to write a wrapper class to parse out "failures" so that the user gets what they expect?

nyamsprod commented 4 years ago

That's because from your business point of view the default behaviour which is inline with how the PSL test suite works is not what you want.

$domain = $rules->resolve('linode.com');

The PSL was first created to resolve cookie issues and in this regards the default behaviour is better suited for that.

$domain = $rules->resolve('linode.com', Rules::ICANN_DOMAINS);

is what, most of the time, developer should be using in regards to domains resolution against TLD suffixes. This should maybe be more explicit in the documentation.

McAnix commented 4 years ago

The PSL was first created to resolve cookie issues and in this regards the default behaviour is better suited for that.

Ah ha! Thank you for that explanation. We use it predominantly to determine the registrable domain. I have adjusted all lookups to include ICANN_DOMAINS.