thom4parisot / tld.js

JavaScript API to work easily with complex domain names, subdomains and well-known TLDs.
https://npmjs.com/tldjs
MIT License
460 stars 55 forks source link

Is tldExists broken or am I going crazy? #106

Closed pushkin- closed 5 years ago

pushkin- commented 7 years ago

Think I'm about to go crazy.

Why does tldExists("sh/a.aspx") return true? tldExists(".aspx") correctly returns false.
Also, tldExists("s/a.aspx") return false.

thom4parisot commented 7 years ago

Hi @pushkin-, you're not crazy at all, don't worry.

It is rather a bug originating in how tldjs uses the Node URL parser.

107 is a quickfix to solve your specific case but #101 lists even more use cases like yours.

thom4parisot commented 7 years ago

@pushkin- well, I add a look into it and it happens than sh is actually a well-known TLD.

Maybe the problem lies in the fact is — in this case — the sh value is a hostname only. And thus we should not try to answer if the TLD exists or not.

What is your thinking about it?

pushkin- commented 7 years ago

Before I get to your question, I'm also having a problem with tldExists("shell/a.aspx"). Would the parser pull out the sh in this case?

Is your question still relevant to my problem?

remusao commented 7 years ago

@pushkin- It seems that shell is also a valid public suffix.

@oncletom Should we return tldExists === false and publicSuffix === null for hostnames having only one label? Should domain === shell and domain === sh in the previous examples?

Not sure what the right behavior is here.

remusao commented 6 years ago

@pushkin- After some thought if a hostname is composed only from an existing TLD, I think the current behavior is correct. Since a TLD (Top-level domain) is a valid domain.

What might be surprising is that sh and shell are valid TLDs, do you think the discussion in this other issue is relevant for your? #111

thom4parisot commented 6 years ago

Hey @pushkin-, the lib has been improved since you opened the issue. Here is the output of tldjs.parse("sh/a.aspx"):

domain: null
hostname: "sh"
isValid: true
publicSuffix: "sh"
subdomain: null
tldExists: true

In your case, it happens this is both a valid hostname, as well as a valid TLD. Although there is no TLD in this string.

Let us know if you think of something more sensible to do on our side of things.

pushkin- commented 6 years ago

I thought top-level domains referred to the bit after the ., in which case, why would tldExists("sh/a.aspx") return true? I would expect tldExists("a/a.sh") to be true, but not the former.

thom4parisot commented 6 years ago

@pushkin- they are the bit after the .yes. But also before the first slash (after the protocol).

In this case, sh is considered as being the hostname and /a.aspx being the path.

Try out this in your browser console:

new URL('http://sh/a.aspx')

returns

URL { href: "http://sh/a.aspx", origin: "http://sh", protocol: "http:", username: "", password: "", host: "sh", hostname: "sh", port: "", pathname: "/a.aspx", search: "" } 
pushkin- commented 6 years ago

@oncletom Regarding:

they are the bit after the .yes. But also before the first slash (after the protocol).

Could you provide a source? From Wikipedia:

For all domains in lower levels, it is the last part of the domain name, that is, the last label of a fully qualified domain name.

Also, I don't understand your example's relevance to my problem. It's fine if sh is considered a hostname, but why must it also be considered a tld (your answer to my first comment will also answer this)?

If what you said is accurate, then I guess tldExists is not what I want to use. The problem I am solving is the following: Given a URL, if it is relative, resolve it appropriately. If it is absolute, return it. If it is some external page (abc.com), just return it.

shell/abc.com should resolve to shell/abc.com (because we see the top-level domain com)

shell/abc.aspx should be resolved to some absolute URL

pushkin- commented 5 years ago

This issue is no longer relevant to me, so feel free to close this, especially if you're not planning on fixing it.