Closed ealashwali closed 5 years ago
Those are what the PSL calls "private" domains. See this FAQ entry.
Thanks for your clarification. One last question please. So the private domains are only added to the tldextract list if I add this line: extract = tldextract.TLDExtract(include_psl_private_domains=True)
, and otherwise they are not?
Correct
As I explained in previous issues, for some reasons I can not perform the
tldextract --update
command. Therefore, as stated in the home page, I use the alternative method, and I delete the cache before I run the program that calls the tldextract so I get an updated list.However, I compared the tldextract TLDs that are in the cache with the available domains in the public-suffix list. The steps are as follows:
1) copy/pase the public-suffix list in a file and clean empty lines or comments using this command:
grep -v -E '^($|//)'
2) Read the tldextract TLDs from the
.tld_set
file using python, then write them to a new file with each TLD in a line3) using regex, clean the files to only contain TLDs with english letters:
[A-Za-z_0-9.-]+
4) Compare the two files. The public-suffix list contains 1404 more TLDs which are:
While the tldextract cache contains only one TLD that does not exist in the public suffix list, which is:
mobily
Can you please explain to me? Is there anything wrong? How can I ensure that my tldextract program is having the latest list from the public suffix list.
Note: I can not use the
tldextract --update
due to system error that I could not identify its source. I have to rely on deleting the cache file before run.