publicsuffix / list

The Public Suffix List
https://publicsuffix.org/
Mozilla Public License 2.0
2.08k stars 1.23k forks source link

removal PRs and the need for caution? #2120

Closed madcow05 closed 2 months ago

madcow05 commented 2 months ago

Hello and thanks to everyone who contributes to the PSL project, especially those who submit PRs for domains they aren't directly managing.

I've been checking the PSL project from time to time for work and I've noticed a new trend where some removal requests might be moving a bit too quickly without fully ensuring the domain owners are aware or in agreement. For example in #2116 requested the removal of a domain that was later confirmed by the owner to still be active. that domain was used for an internal networking tunnel, making it impossible for ANY external/third-party tools to assess its usage accurately. Despite having a _psl TXT record which is overlooked, the domain was almost removed.

I’ve also observed a bit of chaos and inconsistency in recent removal PRs, with a lack of clear priority on which evidence should be considered more important. For instance, in PR #2116, the _psl TXT record was present but was seemingly overridden by other evidence.

This brings up a little concern: some domain owners might not be notified about these changes, especially if their contact information is outdated, leading to unintended removals that could impact user security.

In my view, it's safer to keep domains in the PSL unless there's clear evidence they should be removed. PSL is just a plain text file and it has no control over how ad blockers, spam filters, and security companies use it. If they’re using the PSL to whitelist domains in private sections, they’re using it incorrectly.

On the other hand, however, removing them without sufficient evidence/confirmation could actually lead to higher security risks than keeping them.

I fully support the effort to clean up the list, especially when there are obvious issues like unresponsive nameservers, a more cautious approach might be needed in other cases.

ps. relying on Google site: searches to find subdomains seems like a poor approach since Google has recently stopped indexing new small sites and has even removed many smaller sites from its index.

I'm not a volunteer so take this as my two cents. I might be off base, but I thought it was worth bringing up. I’m sorry if my comment comes across as rude—that’s not my intention.

wdhdev commented 2 months ago

I completely understand your point here. I probably have been a bit excessive with the domain removal PRs, however there are a few things I just wanted to point out:

For example in https://github.com/publicsuffix/list/pull/2116 requested the removal of a domain that was later confirmed by the owner to still be active. that domain was used for an internal networking tunnel, making it impossible for ANY external/third-party tools to assess its usage accurately

That is quite true. In PRs similar to this we have asked for the owner to confirm whether removal is needed or not. For example, if the owner is not responsive and we believe there is not enough evidence to keep the domain on the PSL it will be removed (for example #2063). I will start mentioning original PR authors (assuming they are not ghost users) and possibly emailing the email address on file in future.

I’ve also observed a bit of chaos and inconsistency in recent removal PRs, with a lack of clear priority on which evidence should be considered more important. For instance, in PR https://github.com/publicsuffix/list/pull/2116, the _psl TXT record was present but was seemingly overridden by other evidence.

The _psl TXT record existing does not always mean the domain is still eligible for PSL status, for example a domain on the PSL could lapse, and someone else registers it with the intent of using PSL benefits (e.g. Cloudflare, Letsencrypt, etc.) so they might place the _psl TXT record in their zone in order to create false belief that the domain is still owned by the original registrant.

some domain owners might not be notified about these changes, especially if their contact information is outdated

Normally, assuming the domain isn't expired or something similar we will attempt to contact the owner. If the GitHub account or email address is no longer in use or active we cannot really do anything about it as it is the submitter's responsibility to keep that information updated.

In my view, it's safer to keep domains in the PSL unless there's clear evidence they should be removed.

Most of the time, I only create removal PRs for when there seems to be no activity on the domain whatsoever.

PSL is just a plain text file and it has no control over how ad blockers, spam filters, and security companies use it. If they’re using the PSL to whitelist domains in private sections, they’re using it incorrectly.

Correct.

ps. relying on Google site: searches to find subdomains seems like a poor approach since Google has recently stopped indexing new small sites and has even removed many smaller sites from its index.

I did not know that, however it is just a small piece of evidence and we don't rely on it, it just helps support the removal case. I am aware robots.txt, or something else can prevent indexing so it will never be the sole piece of evidence for removal.

I'm not a volunteer so take this as my two cents. I might be off base, but I thought it was worth bringing up. I’m sorry if my comment comes across as rude—that’s not my intention.

This is a very valid concern, thanks for bringing it up. I'm not a maintainer, just a community member trying to help audit the PSL and reduce maintainers' workloads.

groundcat commented 2 months ago

I understand your concerns, and you’ve made some good points. However, as @wdhdev just mentioned, I believe that volunteers and maintainers are not relying on just one or a few pieces of indicators to decide on inclusion or removal, but rather a combination of multiple factors. For example, the existence of a _psl TXT record on a domain belonging to a new registrant who re-registered an expired domain doesn't always justify its inclusion in the PSL - there could be possible instances where a domain was potentially abused just to bypass third parties, which is not acceptable in my opinion. However, if the existing/new operator is maintaining the project or acted in good faith, that's okay. I do agree with your points that before removing a domain, it is more ideal to first reach out to the submitter or any other contact we can find to the best of our ability before creating a removal PR. As for your concerns about the priority of indicators - based on what Simon mentioned in the psl-discuss email group (which might be a good place if you'd like to discuss further concerns), I think this is a reasonable approach to follow:

Indicators we have:

  • The domain registration is after PSL submission. This indicates that the owner might have changed.
  • The _psl DNS record is missing. This indicates that people are not following our guidance but we haven't enforced it so it's common.
  • The domain is offline.
  • The organization website is offline.
  • The contact e-mail is unreachable.
  • No subdomains have been discovered. This depends on the likelihood that we would discover them.
    • Search engines indicates no active site in use.
    • CT logs indicate no active sites in use.
  • Virustotal or similar indicate abuse.

The most obvious to me would be:

  1. If we think your entry might be unused we send you an email and ask if you still need it.
  2. If we get no reply we try to find a different contact using the organization website.
  3. If no contact can be established and we cannot find any evidence of subdomain use we remove the entry.
dnsguru commented 2 months ago

might be worth looking at if there is something within the CENTR 'Signs of Life' project https://github.com/CENTRprojects/Signs-of-life that could be elements put towards 'debris removals'

simon-friedberger commented 2 months ago

This brings up a little concern: some domain owners might not be notified about these changes, especially if their contact information is outdated, leading to unintended removals that could impact user security.

I totally agree with this, that is why we added the following section to the PR template

The submitter acknowledges that it is their responsibility to maintain the domains within their section. This includes removing names which are no longer used, retaining the _psl DNS entry, responding to e-mails to the supplied address. Failure to maintain entries may result in removal of individual entries or the entire section.

And in comments:

<!- Submitter will maintain domains in good standing or may lose section.

The ongoing trust of the PSL requires it to be free of outdated or problematic entries. In making this pull request, there is a commitment by the submitter that they are going to review and maintain their relevant section. By submitting an entry, the requestor acknowledges that their entry and section may be removed if the domain does not maintain the respective _PSL entries in DNS, any domain(s) within their section fail to resolve in DNS, the domain does not get renewed, expires or is otherwise unreachable. Submitter further identifies that it is their responsibility to review their submitted section within the PSL, submitting updates or removals as their domain(s) may change over time. It is also the responsibility of the submitter to provide (and keep up to date) a reachable email address within the section, and to maintain that address as it may change over time, so that they receive notices. -->

I also agree that we should not remove domains which still have a working _psl entry.

I am trying to be cautious about removing old entries but thank you for the concrete examples of reasons why we might not be able to test the usability of a domain!