Closed pushkin- closed 5 years ago
Hi @pushkin-, you're not crazy at all, don't worry.
It is rather a bug originating in how tldjs
uses the Node URL parser.
@pushkin- well, I add a look into it and it happens than sh
is actually a well-known TLD.
Maybe the problem lies in the fact is — in this case — the sh
value is a hostname only. And thus we should not try to answer if the TLD exists or not.
What is your thinking about it?
Before I get to your question, I'm also having a problem with tldExists("shell/a.aspx")
. Would the parser pull out the sh
in this case?
Is your question still relevant to my problem?
@pushkin- It seems that shell
is also a valid public suffix.
@oncletom Should we return tldExists === false
and publicSuffix === null
for hostnames having only one label? Should domain === shell
and domain === sh
in the previous examples?
Not sure what the right behavior is here.
@pushkin- After some thought if a hostname is composed only from an existing TLD, I think the current behavior is correct. Since a TLD (Top-level domain) is a valid domain.
What might be surprising is that sh
and shell
are valid TLDs, do you think the discussion in this other issue is relevant for your? #111
Hey @pushkin-, the lib has been improved since you opened the issue.
Here is the output of tldjs.parse("sh/a.aspx")
:
domain: null
hostname: "sh"
isValid: true
publicSuffix: "sh"
subdomain: null
tldExists: true
In your case, it happens this is both a valid hostname, as well as a valid TLD. Although there is no TLD in this string.
Let us know if you think of something more sensible to do on our side of things.
I thought top-level domains referred to the bit after the .
, in which case, why would tldExists("sh/a.aspx")
return true? I would expect tldExists("a/a.sh")
to be true, but not the former.
@pushkin- they are the bit after the .
yes. But also before the first slash (after the protocol).
In this case, sh
is considered as being the hostname and /a.aspx
being the path.
Try out this in your browser console:
new URL('http://sh/a.aspx')
returns
URL { href: "http://sh/a.aspx", origin: "http://sh", protocol: "http:", username: "", password: "", host: "sh", hostname: "sh", port: "", pathname: "/a.aspx", search: "" }
@oncletom Regarding:
they are the bit after the
.
yes. But also before the first slash (after the protocol).
Could you provide a source? From Wikipedia:
For all domains in lower levels, it is the last part of the domain name, that is, the last label of a fully qualified domain name.
Also, I don't understand your example's relevance to my problem. It's fine if sh
is considered a hostname, but why must it also be considered a tld (your answer to my first comment will also answer this)?
If what you said is accurate, then I guess tldExists
is not what I want to use. The problem I am solving is the following: Given a URL, if it is relative, resolve it appropriately. If it is absolute, return it. If it is some external page (abc.com), just return it.
shell/abc.com
should resolve to shell/abc.com
(because we see the top-level domain com
)
shell/abc.aspx
should be resolved to some absolute URL
This issue is no longer relevant to me, so feel free to close this, especially if you're not planning on fixing it.
Think I'm about to go crazy.
Why does
tldExists("sh/a.aspx")
return true?tldExists(".aspx")
correctly returns false.Also,
tldExists("s/a.aspx")
return false.