jeremykendall / php-domain-parser

Public Suffix List based domain parsing implemented in PHP
MIT License
1.16k stars 128 forks source link

Recommended list: IANA or PSL? #353

Closed WilliamDEdwards closed 1 year ago

WilliamDEdwards commented 1 year ago

Should the IANA or the PSL list be used in production?

The PSL maintainer recommends using the IANA list (see below), but that one does not contain TLDs such as .co.uk...

Using the Public Suffix List to determine what is a valid domain name and what isn't is dangerous, particularly in these days when new gTLDs are arriving at a rapid pace. If you are looking to know the validity of a Top Level Domain, the IANA Top Level Domain List is the proper source for this information or alternatively consider using directly the DNS.

(https://github.com/jeremykendall/php-domain-parser)

PSL maintainer here: please don’t use the PSL! [...] if you are going to use a list, using the IANA list, updated daily, is much better than the PSL.

(https://news.ycombinator.com/item?id=24437644)

nyamsprod commented 1 year ago

and they are correct PSL and IANA are complementary. The former is about domain suffix while the latter is about registered TLDs. Let's take a quick example

If on Monday someone register on the IANA list the TLD foobar then the IANA list and subsequently the DNS system will be made aware of it as late as tuesday not the PSL one.

If you package resources are up to date then starting tuesday the package will act as shown below:

use Pdp\Domain;
use Pdp\TopLevelDomains;

$topLevelDomains = TopLevelDomains::fromPath('/path/to/cache/tlds-alpha-by-domain.txt');
$publicSuffixList = Rules::fromPath('/path/to/cache/public-suffix-list.dat');

$domain = Domain::fromIDNA2008('www.example.foobar');

$result = $topLevelDomains->getIANADomain($domain); // will work and return a result
$publicSuffixList->getCookieDomain($domain);        // will throw

But if subsequently the co.foobar can be bought on a registrar then it is more than likely that it has been added to the PSL list (not guaranteed).

use Pdp\Domain;
use Pdp\TopLevelDomains;

$topLevelDomains = TopLevelDomains::fromPath('/path/to/cache/tlds-alpha-by-domain.txt');
$publicSuffixList = Rules::fromPath('/path/to/cache/public-suffix-list.dat');

$domain = Domain::fromIDNA2008('www.example.foobar');

$result = $topLevelDomains->getIANADomain($domain); // will work and return a result
$publicSuffixList->getCookieDomain($domain);        // may work or still throw

TL;DR the list up to date for TLDs is the IANA one.

The PSL is the maintained list of PSL but there's no guarantee that the list is either accurate or up to date as its update is not done on a regular basis and is "voluntary".

WilliamDEdwards commented 1 year ago

To correct myself: .co.uk is not a TLD. .uk is. .co is a ccSLD.

To your knowledge, is there a properly maintained suffix list? I understand from your message that the IANA list contains only TLDs, and the PSL contains suffixes. However, apparently, the PSL should not be used...

nyamsprod commented 1 year ago

The PSL SHOULD BE USED but not for validating that a TLD is indeed valid.

WilliamDEdwards commented 1 year ago

So using the PSL is not 'dangerous', but could miss TLDs as it is not updated regularly. Perhaps that should be clarified in the README.

nyamsprod commented 1 year ago

Well that is clearly state on the PSL website not sure why it should be said here again 🤔 see (https://publicsuffix.org/learn/).