publicsuffix / list

The Public Suffix List
https://publicsuffix.org/
Mozilla Public License 2.0
1.93k stars 1.18k forks source link

PSL Private Section Domains WHOIS Checker #2014

Open groundcat opened 2 weeks ago

groundcat commented 2 weeks ago

This PR is related to #1996. It introduces the tools/private_domains_checker, a Python script created to fetch data from the PSL and check the domain status and expiry dates of the private section domains. It performs WHOIS checks on these domains and saves the results into CSV files for manual review.

Please feel free to make any edits!

The README file has been updated to reflect the usage instructions.

Example CSV outputs from real PSL data:

simon-friedberger commented 2 weeks ago

I know this would be a big ask but would you be interested in trying to integrating this with the Go validator in https://github.com/publicsuffix/list/tree/master/tools/internal/parser? The idea being, that this can eventually be used to automatically add DNS & whois information to PRs with a Github action. However, that requires a little bit more effort from the parser to determine which sections have changed so only problems in the relevant section can be displayed. But integration would probably just mean providing a function that takes a URL and returns a summary of the status, like "expires in >2y/has expired/...". This is not super important because the two things do something fairly different but it might still be nice if we could share code here.

dnsguru commented 2 weeks ago

This PR is related to #1996. It introduces the tools/private_domains_checker, a Python script created to fetch data from the PSL and check the domain status and expiry dates of the private section domains. It performs WHOIS checks on these domains and saves the results into CSV files for manual review.

Please feel free to make any edits!

The README file has been updated to reflect the usage instructions.

Example CSV outputs from real PSL data:

Separate from the dialog here, I noted in an issue in @groundcat 's repo that identifying the [client|server]Hold status domain names into a separate file would be beneficial. Domains with either of those statuses almost always get that status for a reason that would make the domain something that should not be on the PSL, PLUS the domain would be NXD as those statuses cause the domain name to not be listed with NS delegation in their TLD zone files.

groundcat commented 2 weeks ago

The author of the whois package has declared it unsupported, is there an alternative?

Thank you for spotting this issue. I have replaced it with the whoisdomain package, which is recommended by the author of the retired whois package.

Update: I just realized that I was initially using the python-whois package, not the whois package that has been deprecated, even though both were developed by the same author. The python-whois package appears to be under active development. The main difference is that python-whois maintains its own list of whois servers, while the whoisdomain package relies on the Linux whois utility so basically it takes the whois data from the os and parses the information from it. I might use both packages and make one a fallback solution when the other returns null results.

groundcat commented 2 weeks ago

I noted in an issue in @groundcat 's repo that identifying the [client|server]Hold status domain names into a separate file would be beneficial

Thanks for the input. I added a new filter to get a CSV list of domains with any form of hold status, and another filter for CSV files with domains expiring within 2 years. I guess the latter might not be very useful at the moment since a handful of them are expiring less than 2 years and were probably submitted before the requirement was established, so they might not be aware of it, similar to the requirement for keeping the _psl TXT records at all times.

simon-friedberger commented 3 days ago

My stance is that we want to be strict on the _psl DNS entry but lax with the expiration times because it's often impossible to check for us and often impossible to get >2y for the requester.