jeremykendall / php-domain-parser

Public Suffix List based domain parsing implemented in PHP
MIT License
1.16k stars 128 forks source link

cannot get info for sa.gov.au #313

Closed gsouf closed 3 years ago

gsouf commented 3 years ago

Issue summary

We use pdp to determine whether a given url uses a valid domain name.

As per issue #251 the library does not allow to determine whether or not sa.gov.au is a valid hostname, when it is actually. Please visit it at https://sa.gov.au/

Standalone code, or other way to reproduce the problem

$manager = new Manager(new Cache(), new CurlHttpClient());
var_dump($manager->getRules()->resolve('sa.gov.au'));

Expected result

anything that helps to tell that sa.gov.au is a valid hostname

Actual result

object(Pdp\Domain)#66 (7) {
  ["domain"]=>
  string(9) "sa.gov.au"
  ["registrableDomain"]=>
  NULL
  ["subDomain"]=>
  NULL
  ["publicSuffix"]=>
  NULL
  ["isKnown"]=>
  bool(false)
  ["isICANN"]=>
  bool(false)
  ["isPrivate"]=>
  bool(false)
}

Happy to give contributions if required.

Thanks

nyamsprod commented 3 years ago

@gsouf it seems that you are using the v5 version which is EOL. As far as I am concerned you should use v6 for which you would get a clearer understanding of the expected behaviour.

In v6 the following code throws an exception:

<?php

use Pdp\Rules;

$rules = Rules::fromPath('path-to-your-local-copy-of-public-suffix-list.txt');
$rules->getICANNDomain('sa.gov.au');
//throw  Pdp\UnableToResolveDomain: The public suffix and the domain name are is identical `sa.gov.au`.

Like explained in #251 sa.gov.au is a public suffix on its own (it is explicitly added to the PSL list) so you can't resolve it just like you can't resolve ac.be or co.uk. To resolve a domain it needs to at least have a subdomain attached to it so to summarise:

Hope this clarify your issue.

PS: in the current state a way to resolve this is to either have sa.gov.au removed from the list or maybe add a resolveSuffix method to the Rules class 🤔 .

gsouf commented 3 years ago

@nyamsprod Thanks for the clear explanations and for the version notice. For the moment we bypass pdp for sa.gov.au to get things working.

I understand that sa.gov.au is in the PSL. That probably makes sense because it is a registrable domain (https://www.domainname.gov.au/apply-new-sagovau-domain-name).

However, my knowledge is certainly limited on this topic but, it seems that there is an inconsistency in this list because it mixes things that you cannot browse and are purely reserved for registering a FQDN. Like obviously a tld com or a generic second level domain like co.uk (ie url https://com and https://co.uk or email someone@com do not work). With specialized domains that resolve to something like gov.au or sa.gov.au or even ngrok.io ❓❓

Do you know if there is a way to distinct them somehow?

What are you thinking for the method resolveSuffix exactly?

nyamsprod commented 3 years ago

Do you know if there is a way to distinct them somehow?

Again if you upgrade to v6 you will hopefully get your answer as it exposes more strict methods:

check https://github.com/jeremykendall/php-domain-parser#resolving-domains for more informations.

the resolveSuffix was just a thought but I think it does not make sense to implement it. the PSL is not just a suffix list it really is a collection of rules that can validate or invalidate suffixes. Hence the name of the class Rules.

SaschaMai commented 3 years ago

@gsouf Are you able to download www.sa.gov.au instead of sa.gov.au?

gsouf commented 3 years ago

@SaschaMai that would definitely work but that's not the desired behavior, because the issue is not limited to this domain, but to any domain in the psl. If I added a www in front of each domain then things like www.com would be validated too. The tricky part is that technically we wouldn't want either things like co.uk to be validated. But it seems there is no easy way to achieve this.

I'll first upgrade to v6, probably next week and leave a feedback here on how I solved the problem as soon as I have got it working

nyamsprod commented 3 years ago

@gsouf what is the problem of having www.com being a valid domain AFAIK when registering a domain you are in fact registering a second level domain.

In other words, you can never registered sa.gov.au but you must register www.sa.gov.au.

Reason why I said that the issue, if issue there is must be taken to the PSL repo and not to the current package 😉

gsouf commented 3 years ago

@nyamsprod because the proposition was to add "www" in front of the string to validate.

That means that if I'm trying to validate the string "com" I'd validate "www.com" that is valid, but does not make "com" alone valid.

as for sa.gov.au it is an actual website and people have email addresses with sa.gov.au (someone@sa.gov.au), so even if it's not registrable it's still used and it is a valid domain name.

I'll try to add open a ticket in PSL repo but I'm not sure it's going anywhere

nyamsprod commented 3 years ago

@gsouf seems your issue has a relevant yet complicate issue already opened on the PSL repo see https://github.com/publicsuffix/list/issues/788.

TL;DR: definitely an issue on the upstream public suffix list and not one this package can fix/resolve.