publicsuffix / list

The Public Suffix List
https://publicsuffix.org/
Mozilla Public License 2.0
1.93k stars 1.18k forks source link

Recommended Deletions of Expired Domains #1996

Closed groundcat closed 3 weeks ago

groundcat commented 1 month ago

Expired Domains

Below is a list of all private section domains that have expired and are either in the pending delete state, where the original owner cannot renew them, or have already been deleted by the registry and show available/free status. In either case, they will not serve the purpose for which they were originally requested to be added to the PSL and probably should be removed from the PSL:

WHOIS Records

Below is the WHOIS record as of today, Friday, June 14, 2024, around 03:00 AM EDT, which indicates the status of these domains. Please advise on the best way to help remove these domains from the list. Since the original registrants won't be able to add the _psl TXT record to them, maybe I should create a pull request to remove these domains? Thanks.

WHOIS records for the domain ro.im:

Domain Name: ro.im
The domain ro.im was not found.

WHOIS records for the domain blogspot.mr:

Domain Name: blogspot.mr
Domain Status: No Object Found
>>> Last update of WHOIS database: 2024-06-14T06:58:43.545Z <<<

WHOIS records for the domain cn.vu:

No Data Found
URL of the ICANN Whois Inaccuracy Complaint Form: https://www.icann.org/wicf/
>>> Last update of WHOIS database: 2024-06-14T06:59:24Z <<<

WHOIS records for the domain lab.ms:

Domain Name: lab.ms
Registry Domain ID: 604616-CoCCA
Updated Date: 2024-05-10T03:02:06Z
Creation Date: 2022-03-10T17:36:19Z
Registry Expiry Date: 2024-05-11T18:25:14Z
First Registration: 1998-09-28T00:00:00Z
Domain Status: active
Domain Status: pending delete
Domain Status: server hold

WHOIS records for the domain instantcloud.cn:

Domain Name: instantcloud.cn
ROID: 20230613s10001s52894822-cn
Domain Status: pendingDelete
Domain Status: clientHold
Domain Status: inactive
Registrant: ******
Registrant Contact Email: ******
Sponsoring Registrar: ******
Registration Time: 2023-06-13 04:03:39
Expiration Time: 2024-06-13 04:03:39
DNSSEC: unsigned
simon-friedberger commented 1 month ago

I am in favor of removing these. @dnsguru, what's your opinion?

@groundcat I've been contemplating how we might detect domain expiration. If you would like to provide your script as a starting point I would appreciate a PR! Maybe we can just run this on a regular cadence and manually process the results.

groundcat commented 1 month ago

If you would like to provide your script as a starting point I would appreciate a PR! Maybe we can just run this on a regular cadence and manually process the results.

@simon-friedberger Sure! I created a simple Python script that goes through all private section domains, retrieves their top-level domains, and identifies those that consistently return NXDOMAIN errors across multiple public DNS services. I considered using batch WHOIS queries to get their expiry dates, but that would probably violate the terms of service of some registries/WHOIS providers. Once I finish cleaning up the code, I will create a pull request. NXDOMAIN errors can occur for various reasons (such as reserved domain space, temporary changes, or internal-only domains) other than domain expiration, so some manual checking against WHOIS records will still be necessary.

I just created something that scans for NXDOMAIN error domains and compares them with previous scan results on a weekly basis, so I will receive notifications about potentially expired domains. I don't mind helping with validating them and submitting pull requests to delete them, but perhaps I should submit them less frequently to avoid increasing your workload too much.

weppos commented 1 month ago

I am in favor of removing these. @dnsguru, what's your opinion?

Technically, these are still registered (although in pendingDeletion). I believe we should wait for a domain to be formally de-registered anyways. Pending deletion length can vary greatly between TLDs (especially ccTLDs). Some of them don't have it, some may even allow to restore in pending deletion.

IMHO it would be more effective and easy to script a check that confirms the removal the moment the domain is de-registered.

dnsguru commented 4 weeks ago

I am in favor of the removal on domains that are no longer registered.

We should factor into whatever automated removal the fuzzy amount of time that the removal or addition of entries (and subsequent updated PSL) takes as it cascades out into the many places it gets used.

Names that have been flagged by the registrar and placed on clientHold or serverHold status by the respective registrar or registry so that they are delisted from zone files are less "black and white" than expired or deleted domains.

Though there is occasional "friendly fire" situations where a domain might achieve such a status from being falsely or over-casually listed on realtime blackhole lists (RBL), more often than not a domain name with either of those statuses typically had some bad activity or violated the policies of the registrar or registry (or both).

The technical impact of those statuses is that the domains no longer resolve, but the removal of those statuses can immediately relist them in zones.

Some registrars or registries just intake RBL and throw stuff on hold, -then- investigate (vs. opposite order). This is in order to quickly mitigate, then dial it back if there was a mistake, so as to limit impact.

RBL maintainers sometimes have a "kill everyone, let god sort them out" approach, or get AI sensor logic wrong or place low review on addition reports from third parties and list something totally innocent, and when they do so there can be disruption to legitimate services.

If names were to sit in clientHold or serverHold for more than 45 days It is probably OK to interpret this as longer being viable to be listed in the PSL. That leaves some time for any friendly-fire remedy to occur.

Were we to automate removals, the elegant way to address the dynamics I have described would be to delete names that are deleted and have timing logic for hold scenarios.

On Sat, Jun 15, 2024, 8:28 AM Simone Carletti @.***> wrote:

I am in favor of removing these. @dnsguru https://github.com/dnsguru, what's your opinion?

Technically, these are still registered (although in pendingDeletion). I believe we should wait for a domain to be formally de-registered anyways. Pending deletion length can vary greatly between TLDs (especially ccTLDs). Some of them don't have it, some may even allow to restore in pending deletion.

IMHO it would be more effective and easy to script a check that confirms the removal the moment the domain is de-registered.

— Reply to this email directly, view it on GitHub https://github.com/publicsuffix/list/issues/1996#issuecomment-2169902630, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACQTJI4GWHMS4DIEZ4TH6DZHRMTFAVCNFSM6AAAAABJJZBX36VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNRZHEYDENRTGA . You are receiving this because you were mentioned.Message ID: @.***>

simon-friedberger commented 4 weeks ago

@dnsguru , @weppos do you have opinions on whois API? Companys like

These are usually reasonably cheap but I'm not sure how much value the actually offer.

dnsguru commented 4 weeks ago

Best to not use 3rd party or paid services when https://lookup.icann.org exists

Currently, the registry/registrars' registration data stuff is undergoing a big overhaul that will deprocate port 43 whois for an xml-based system called RDAP, with a deployment deadline of August 21, 2025

On Mon, Jun 17, 2024, 12:17 AM Simon Friedberger @.***> wrote:

@dnsguru https://github.com/dnsguru , @weppos https://github.com/weppos do you have opinions on whois API? Companys like

These are usually reasonably cheap but I'm not sure how much value the actually offer.

— Reply to this email directly, view it on GitHub https://github.com/publicsuffix/list/issues/1996#issuecomment-2172475055, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACQTJODCL4JOXAIDFFYV3TZH2EP3AVCNFSM6AAAAABJJZBX36VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNZSGQ3TKMBVGU . You are receiving this because you were mentioned.Message ID: @.***>

groundcat commented 3 weeks ago

do you have opinions on whois API

I'm using Linux's whois utility to obtain raw WHOIS data from registry servers, then parsing it into structured data with the Python python-whois library. This includes details like domain expiry dates and registry statuses. This approach is entirely free, without relying on any commercial WHOIS API intermediaries.

Here is an example of the output: https://github.com/groundcat/PSL-private-domains-checker/blob/main/data/nxdomain.csv

The CSV file lists all domains that returned an NXDOMAIN status, along with their expiry dates and registry statuses. The approach works reasonably well with most registries, although some have null values.

I've added the script to this repository as I'm still refining it. It's possible to create GitHub Actions to run the script on a daily basis, which automatically updates the csv files.

simon-friedberger commented 3 weeks ago

According to the man page on my machine the whois utility uses https://www.iana.org/whois For ro.im that sends me to whois: whois.nic.im and that gives me "The domain ro.im was not found." which is also what the whois utility gives me. I expected this to be the same as the ICANN lookup at https://lookup.icann.org/. But the ICANN lookup for ro.im gives me:

No registry RDAP server was identified for this domain. Attempting lookup using WHOIS service.
Failed to perform lookup using WHOIS service: TLD_NOT_SUPPORTED.

So, at least for now, it seems we're better off using the whois utility than trying to use the ICANN lookup.

simon-friedberger commented 3 weeks ago

Bringing it back to the specific task here. Assuming that we only want to delete domains which are actually not registered, the only ones arero.im and cn.vu, correct?

dnsguru commented 3 weeks ago

The TL;DR version of my response is : No match in whois / rdap + no SOA in DNS = 99% confidence in "it does not exist any more".

The longer version is below, but it includes an important note to inform any automation plans...

According to the man page on my machine the whois utility uses https://www.iana.org/whois For ro.im that sends me to whois: whois.nic.im and that gives me "The domain ro.im was not found." which is also what the whois utility gives me. I expected this to be the same as the ICANN lookup at https://lookup.icann.org/. But the ICANN lookup for ro.im gives me:

You followed the correct path. Start at the IANA and then drill down from there. One interesting note, should automation be contemplated for this... it needs some elegance based upon which of the sections is in focus (ICANN/PRIVATE), as the logic is nuanced. ccTLDs will often not have their 'official stubs' in their registry - so this is sometimes going to happen, but this is only IF it were in the upper section of the file (the #ICANN section). Were this in the upper section, I then will look at the website for registration services that IANA lists, and review what sub-spaces that they have available, looking for a match. Or in a lot of cases I know someone at the registry to ask. (Note: I want to give some massive respect to nominet here... they actually have added in their whois a status for subspaces like GOV.UK that actually says 'public suffix'.)

If this was an entry in the lower section (the #PRIVATE section), then its non-existence should be considered confirmed if the name is also not showing any answers to a SOA lookup in DNS, which would indicate its existence in DNS.

also:

No registry RDAP server was identified for this domain. Attempting lookup using WHOIS service.
Failed to perform lookup using WHOIS service: TLD_NOT_SUPPORTED.

I should have mentioned that the lookup.ICANN.org is for gTLDs and not ccTLDs. There may be some exceptions on the ccTLDs being present, but because ccTLDs are not under ICANN, it is entirely an opt-in situation that has varied adoption.

So, at least for now, it seems we're better off using the whois utility than trying to use the ICANN lookup.

Yes, where there is a command line whois it will work, and the people who code those have done a lot of the work in connecting up the current resources where they are available for answers (as opposed to using third party whois re-gurgitators that are often providing stale or inaccurate info)... but as I mentioned, a lot of that will start to dry up after August of 2025 in the gTLD space due to the policy changes.

simon-friedberger commented 3 weeks ago

@groundcat I am closing this because I don't think there is anything else to do here but if you want to create a PR for some automation for this please do!

groundcat commented 2 weeks ago

@simon-friedberger Created PR #2014 :)