publicsuffix / list

The Public Suffix List
https://publicsuffix.org/
Mozilla Public License 2.0
1.92k stars 1.17k forks source link

Feature Request: Authority Syntax #2002

Open aph3rson opened 2 weeks ago

aph3rson commented 2 weeks ago

Hello, PSL community,

We (Amazon) are aware that we take up a sizable chunk of the PSL. While we have taken strides internally to emit only the suffixes absolutely-necessary to the PSL, we recognize that we are still the largest private entity on the PSL today. Part of this is related to our DNS infrastructure, and part of it to PSL syntax.

Some challenges we’ve faced so far include:

Our team has a solution to propose on this, which we’ve dubbed the “authority” syntax. This would function almost like an #include or import statement in modern programming languages, and allows for those with demonstrated control over a parent domain to define their child public suffixes (either direct or indirect). The idea here being that the load is shifted off of the PSL maintainers for these large PSL entities, and onto the entities themselves/the suite of PSL libraries in existence today.

We’d originally defined this in an IETF-style Internet Draft, but those aren’t as conducive to a discussion on GitHub and there isn’t precedent for the PSL being governed by such documents. The examples and considerations below capture the technical detail within said prior Internet Draft, though.

A few top-level TL;DRs:

Example

We’ll use the following minimal PSL as an example:

// Aardvark Corp, LLC
// Submitted by: Aardvark Security <security@aardvark-corp.example>

books.aardvark-corp.example
!login.books.aardvark-corp.example

// Badger Technologies
// Submitted by: Big Badger <big.badger@badger-tech.example>

fsrv.af-1.badger-tech.example
view.af-1.badger-tech.example
fsrv.as-1.badger-tech.example
view.as-1.badger-tech.example
fsrv.as-2.badger-tech.example
view.as-2.badger-tech.example
fsrv.eu-1.badger-tech.example
view.eu-1.badger-tech.example
fsrv.na-1.badger-tech.example
view.na-1.badger-tech.example
fsrv.na-2.badger-tech.example
view.na-2.badger-tech.example
fsrv.sa-1.badger-tech.example
view.sa-1.badger-tech.example
*.badger.example

// Coyote Partners
// Submitted by: William Coyote <william@team-coyote.example>

clients.team-coyote.example
mail.team-coyote.example

In the above example, Badger Technologies provides a significant number of entries on the PSL, and may require additional suffixes (given their domain architecture). Rather than adding each suffix directly onto the list, an authority record is added to the PSL, with a line beginning with @. This syntax is similar to other usage of special characters in the PSL (e.g. ! and *), and is not expected to impact list sort or functionality. An example of this syntax is below:

// Aardvark Corp, LLC
// Submitted by: Aardvark Security <security@aardvark-corp.example>

books.aardvark-corp.example
!login.books.aardvark-corp.example

// Badger Technologies
// Submitted by: Big Badger <big.badger@badger-tech.example>

@badger-tech.example
*.badger.example

// Coyote Partners
// Submitted by: William Coyote <william@team-coyote.example>

clients.team-coyote.example
mail.team-coyote.example

This line indicates that suffixes in this portion of the list are provided by the owners of badger-tech.example. To fetch these suffixes, a PSL consumer would need to pull a well-known URI (RFC5785), such as https://badger-tech.example/.well-known/public_suffix_list.dat (the file name is based on the existing PSL file). This file might contain the following content, which uses otherwise-identical syntax to the standard PSL:

// Badger Views
view.af-1.badger-tech.example
view.as-1.badger-tech.example
view.as-2.badger-tech.example
view.eu-1.badger-tech.example
view.na-1.badger-tech.example
view.na-2.badger-tech.example
view.sa-1.badger-tech.example

// Badger File Service
fsrv.af-1.badger-tech.example
fsrv.as-1.badger-tech.example
fsrv.as-2.badger-tech.example
fsrv.eu-1.badger-tech.example
fsrv.na-1.badger-tech.example
fsrv.na-2.badger-tech.example
fsrv.sa-1.badger-tech.example

Considerations / Open Questions

As with any technical proposal, there are a number of considerations made by the authors that discussion from the PSL community would be helpful on. Some of our open questions on these points are below.

Domain Verification

When adding a new suffix authority record to the PSL, the same DNS verification process associated with current PSL modifications is expected. However, a suffix authority adding records to their own authority file MAY implement their own verification process for entries added - suffix authorities are not required to publish/maintain DNS verification records for the suffixes in their own authority file.

Authority Fetching

The authority file would be fetched from a .well-known file (RFC5785) for the authority’s domain. For example, for @badger-tech.example, the authority file would be located at https://badger-tech.example/.well-known/public_suffix_list.dat.

Authority Isolation

It should be noted that in no case should a suffix authority be allowed to add suffixes to the PSL for domains that are not their children. Allowing such behavior would permit a suffix authority on the PSL to influence the PSL behavior for domains not under their control and potentially influence PSL-oriented behavior depended on by other Internet entities. If an authority file contains a suffix which is not its child, that suffix in the authority file MUST be ignored.

Client Anonymity

If desired, PSL clients MAY choose to add a configuration option to permit/deny interaction with suffix authorities to protect client anonymity. Careful considerations by PSL client maintainers should be observed, as usage of this option will cause the library to operate on an “incomplete” version of the PSL. Implementation of these options is left to individual maintainers of PSL clients and/or libraries which may allow for specifying a custom PSL artifact.

Authority Redirection

To prevent cases of a malfunctioning/malicious suffix authority from directing traffic to a destination they do not operate, redirects are only permitted if the redirect target is within the suffix authority’s domain, or a child domain thereof. Responses from a suffix authority redirecting to an HTTPS server outside of their control SHOULD be rejected.

Logistical Considerations

It should be noted that this could be used to reduce the size of the source for the PSL, and upkeep effort for the PSL maintainers. However, the size of PSL artifacts will likely remain unchanged, and may even increase. The process of collecting changes from authority files may also require changes to the automation currently used within the PSL.

Migration Processes

As PSL syntax hasn’t changed for a significant amount of time (/ever?), there may be some migration/onboarding work necessary here.

dnsguru commented 2 weeks ago

wow that is a lot. I think you're seeking to tap maple syrup from a pine 2x4 here.

I mean it is smart idea, it sounds sensible, and I get the objective. The concept, no matter how reasonable it may sound, comes front-loaded with obligations on the downline consumers of the PSL to implement something. And I think that is where it subsequently fails.

Some may embrace this, some may not. It is a really, really diverse set of consumers/integrators with varying levels of engagement/set-and-forget use of the PSL from the github repo and the maintainers have no dominion over those parties or what outcomes to expect from them.

Some of the past initiatives akin to this, to introduce things beyond a text file have been glorious volunteer cycle drains that netted out to manifest as disposible effort for the volunteers. The challenge has been that the PSL consumer space seems to be lacking a desire to evolve.

If a portion of the consumer space might engage and others wont. The main concern that should be paramount to us all is to not introduce fragmentation. So this is why we keep the status quo and do so to ensure the lowest common denominator is consistent for all consumers.

I'd look, as a parallel of your concept to maybe server-side includes or something at the time of PSL generation as a means to perhaps accomplish what you're proposing.. things that might generate these sections into the PSL on some interval.

There is some dev currently happening on automation - if Amazon wanted to put resourcing towards aiding in our backlog debt as we get that all oranized it might be helpful to look at some of the DNS validation script work being done at the time of pull request reviews, and see if that library could be extended to do a subsequent poll in the presence of some commented text. We need to get that all stable first, and focus on not breaking the status quo, but there is some thought going towards generating the list in other formats on the undocumented roadmap.

Meanwhile, there is gratitude for the processes put in place to not wrecking-ball our (I am being generous when I say) modest volunteer resources, and recognition that the desire is to net out to something smarter and better.

danderson commented 2 weeks ago

I defer to @dnsguru and other maintainers on the strategic side of what evolution is even possible.

But, as I'm working on some PSL tooling right now, on a technical level here's some short/medium term things we could do to ease both Amazon's and PSL maintainers' burden. My options aren't particularly elegant or web-scale, but pragmatically they're something I could commit to implementing, if they're acceptable to PSL maintainers. Certainly I'm sure I can do these quicker than defining a new protocol, file format, and getting the world to adopt both :)

Support (hardcoded) recursive ownership validation

Looking at the block of Amazon-managed domains in the PSL, the large majority fall within a few top-level domains, whose ownership is either well established, or could be established by some TBD one time process. Assuming that's done, we could change the parser/validator code I'm writing as follows:

Modulo working out a one-time verification, this reduces the cost of reviewing Amazon's PSL changes to ~nothing, and reduces Amazon's per-new-suffix burden to ~nothing. There is still some burden when validating new parent domains and new authorized PR senders, but as I said I think I can get that burden down to the equivalent of clicking approve on a trivial PR.

I can't promise any timeline for doing this, since I'm a volunteer and I'm still getting the very basic machinery for the automation set up, and writing basic validation passes. But this is all squarely in the ballpark of the general kind of automation I would like to end up with, even if I wasn't thinking of this specific shape until this thread.

Import external suffixes as a cronjob

This is effectively the parallel option @dnsguru proposed: Amazon gives us a URL of the suffixes it wants for its domains, and a robot periodically pulls that and merges it into the PSL - either as a PR that still requires human signoff, or fully hands off if the merge passes tests+lints. The tooling I'm building is explicitly aiming to enable machine-driven edits without causing spurious changes or requiring a wholesale format change.

Right now machine-edits is at the bottom of my TODO, behind "get the basics set up", "write/port/improve a bunch of lints and validations", and so on. So this would be more of a medium-term thing, but again if this is acceptable to the PSL maintainers in principle, that's definitely something I'm willing to build towards, and we can work out the details when I have code in hand to look at.

Both of these proposals don't require downstream consumers to change anything, and (hopefully) provide incremental improvement along the way. I freely admit it's not particularly elegant or scalable, and it doesn't fix all the pain points that were mentioned... But I'm very sure I can get the above working with minimal surprises, and afaict it would incrementally reduce the workload of maintainers on both sides of this bug.

It's also all a logical continuation of my existing automation todo list, so ~95% of work needed for the above is stuff I plan to implement regardless of what we do about this specific bug (again, assuming my plans and implementation are acceptable to PSL maintainers, my plans are just my plans to try and help, not an authoritative roadmap).

dnsguru commented 2 weeks ago

We are exploring a comment syntax that would allow for expression of an abuse contact and rdap/whois to be present, and a possible idea for how one might express a "go get more sub-stuff" might be to look at that syntax as a means to express what Ian is attempting to have occur in a more backward-compatible manner...

One possible friendly amendment might be to alter the syntax suggested from lines that would start with an @ to instead be a specifically-formed comment prefix like // ++.

Example:

// Badger Technologies 
// Submitted by: Big Badger <big.badger@badger-tech.example> 

@badger-tech.example
*.badger.example

Might instead be articulated as

// Badger Technologies 
// Submitted by: Big Badger <big.badger@badger-tech.example> 
// ++badger-tech.example

*.badger.example

Of course, I think convincing consumers to do anything more than just read in a text file is a large issue. As is building out whatever retrieval subroutines with guardrails to keep subitem retrieval remains within the namespace scope.

This all assumes that consumers might read and do, or ignore a comment line.

Rather than being "hot-cuppa-no", wanted to theorize some "maybe" ideas.

All of this, of course, still assumes there were resources beyond the existing ones to do anything.

You wouldn't happen to know a large internet company that has over a trillion dollar valuation that could put forth more than just ideas for unpaid volunteers to solve for them, would you?

simon-friedberger commented 2 weeks ago

@aph3rson Can we clarify the motivation a bit, please?

  • we can irrefutably prove ownership of a higher-order domain, but still have to perform DNS verification on child domains (causing additional overhead for owners of PSL-onboarded services).

This seems to be an organizational issue and changing the rules for validation as @danderson suggests might be a solution here and would not require a change to the PSL format. Does that sound correct?

Maybe I missed it but I don't see how the "validate at the top" solution logically applies to multiple registries? So, Badger Technologies says they are the owner of badger-tech.com and want to decide the PSL entries with a @badger-tech.example entry or such. Then, their customer at cust1.badger-tech.example wants to have that added to the PSL but cust2.badger-tech.example does not want to be on the PSL, how do they do it?

  • syntax for the PSL prevents us from targeting all but the immediate or second-order (wildcard) children of a given suffix - there’s no infinite-wildcard syntax, and wildcards still need to be in the left-most label.

Is the concern here just the size of the PSL? Or would you like to add and remove entries without changing the PSL? Single-level wildcards haven't always been a rule and I wouldn't mind changing that but as @dnsguru says, we don't know what the downstream consumers do.

Looking at for example this section

emrappui-prod.cn-north-1.amazonaws.com.cn emrnotebooks-prod.cn-north-1.amazonaws.com.cn emrstudio-prod.cn-north-1.amazonaws.com.cn emrappui-prod.cn-northwest-1.amazonaws.com.cn emrnotebooks-prod.cn-northwest-1.amazonaws.com.cn emrstudio-prod.cn-northwest-1.amazonaws.com.cn emrappui-prod.af-south-1.amazonaws.com emrnotebooks-prod.af-south-1.amazonaws.com emrstudio-prod.af-south-1.amazonaws.com emrappui-prod.ap-east-1.amazonaws.com emrnotebooks-prod.ap-east-1.amazonaws.com emrstudio-prod.ap-east-1.amazonaws.com emrappui-prod.ap-northeast-1.amazonaws.com emrnotebooks-prod.ap-northeast-1.amazonaws.com emrstudio-prod.ap-northeast-1.amazonaws.com emrappui-prod.ap-northeast-2.amazonaws.com emrnotebooks-prod.ap-northeast-2.amazonaws.com emrstudio-prod.ap-northeast-2.amazonaws.com emrappui-prod.ap-northeast-3.amazonaws.com emrnotebooks-prod.ap-northeast-3.amazonaws.com emrstudio-prod.ap-northeast-3.amazonaws.com emrappui-prod.ap-south-1.amazonaws.com emrnotebooks-prod.ap-south-1.amazonaws.com emrstudio-prod.ap-south-1.amazonaws.com emrappui-prod.ap-south-2.amazonaws.com emrnotebooks-prod.ap-south-2.amazonaws.com emrstudio-prod.ap-south-2.amazonaws.com emrappui-prod.ap-southeast-1.amazonaws.com emrnotebooks-prod.ap-southeast-1.amazonaws.com emrstudio-prod.ap-southeast-1.amazonaws.com emrappui-prod.ap-southeast-2.amazonaws.com emrnotebooks-prod.ap-southeast-2.amazonaws.com emrstudio-prod.ap-southeast-2.amazonaws.com emrappui-prod.ap-southeast-3.amazonaws.com emrnotebooks-prod.ap-southeast-3.amazonaws.com emrstudio-prod.ap-southeast-3.amazonaws.com emrappui-prod.ap-southeast-4.amazonaws.com emrnotebooks-prod.ap-southeast-4.amazonaws.com emrstudio-prod.ap-southeast-4.amazonaws.com emrappui-prod.ca-central-1.amazonaws.com emrnotebooks-prod.ca-central-1.amazonaws.com emrstudio-prod.ca-central-1.amazonaws.com emrappui-prod.ca-west-1.amazonaws.com emrnotebooks-prod.ca-west-1.amazonaws.com emrstudio-prod.ca-west-1.amazonaws.com emrappui-prod.eu-central-1.amazonaws.com emrnotebooks-prod.eu-central-1.amazonaws.com emrstudio-prod.eu-central-1.amazonaws.com emrappui-prod.eu-central-2.amazonaws.com emrnotebooks-prod.eu-central-2.amazonaws.com emrstudio-prod.eu-central-2.amazonaws.com emrappui-prod.eu-north-1.amazonaws.com emrnotebooks-prod.eu-north-1.amazonaws.com emrstudio-prod.eu-north-1.amazonaws.com emrappui-prod.eu-south-1.amazonaws.com emrnotebooks-prod.eu-south-1.amazonaws.com emrstudio-prod.eu-south-1.amazonaws.com emrappui-prod.eu-south-2.amazonaws.com emrnotebooks-prod.eu-south-2.amazonaws.com emrstudio-prod.eu-south-2.amazonaws.com emrappui-prod.eu-west-1.amazonaws.com emrnotebooks-prod.eu-west-1.amazonaws.com emrstudio-prod.eu-west-1.amazonaws.com emrappui-prod.eu-west-2.amazonaws.com emrnotebooks-prod.eu-west-2.amazonaws.com emrstudio-prod.eu-west-2.amazonaws.com emrappui-prod.eu-west-3.amazonaws.com emrnotebooks-prod.eu-west-3.amazonaws.com emrstudio-prod.eu-west-3.amazonaws.com emrappui-prod.il-central-1.amazonaws.com emrnotebooks-prod.il-central-1.amazonaws.com emrstudio-prod.il-central-1.amazonaws.com emrappui-prod.me-central-1.amazonaws.com emrnotebooks-prod.me-central-1.amazonaws.com emrstudio-prod.me-central-1.amazonaws.com emrappui-prod.me-south-1.amazonaws.com emrnotebooks-prod.me-south-1.amazonaws.com emrstudio-prod.me-south-1.amazonaws.com emrappui-prod.sa-east-1.amazonaws.com emrnotebooks-prod.sa-east-1.amazonaws.com emrstudio-prod.sa-east-1.amazonaws.com emrappui-prod.us-east-1.amazonaws.com emrnotebooks-prod.us-east-1.amazonaws.com emrstudio-prod.us-east-1.amazonaws.com emrappui-prod.us-east-2.amazonaws.com emrnotebooks-prod.us-east-2.amazonaws.com emrstudio-prod.us-east-2.amazonaws.com emrappui-prod.us-gov-east-1.amazonaws.com emrnotebooks-prod.us-gov-east-1.amazonaws.com emrstudio-prod.us-gov-east-1.amazonaws.com emrappui-prod.us-gov-west-1.amazonaws.com emrnotebooks-prod.us-gov-west-1.amazonaws.com emrstudio-prod.us-gov-west-1.amazonaws.com emrappui-prod.us-west-1.amazonaws.com emrnotebooks-prod.us-west-1.amazonaws.com emrstudio-prod.us-west-1.amazonaws.com emrappui-prod.us-west-2.amazonaws.com emrnotebooks-prod.us-west-2.amazonaws.com emrstudio-prod.us-west-2.amazonaws.com

You could probably cut that down to a third or so already by using

*.il-central-1.amazonaws.com
*.me-central-1.amazonaws.com
*.me-south-1.amazonaws.com
*.sa-east-1.amazonaws.com
*.us-east-1.amazonaws.com
...

is there a reason why you're not doing that?

  • changing DNS infrastructure for new/existing services is backwards-incompatible, and would break either the PSL or its consumers in painful ways.

I'm not sure I understand what the problem is here. While we have plans to automatically check the _psl entries in DNS and removing entries we haven't been doing that so far so DNS changes shouldn't really impact anyone.

dnsguru commented 1 week ago

@aph3rson it seems like this is a 'flying submarine' request; ie seeking something beyond the capability of a text file.

We have a lot of 'set and forget' participant requestors that are super casual about their listings once their PR gets merged, an example being https://github.com/publicsuffix/list/pull/1401#issuecomment-2193894223 where the requestor let a name lapse, another party picked up the name, and there might be some security implications.

It seems like part of the desired functionality saught in this functionality being identified in your issue would be that the listed @domain.example would somehow be treaated as administratively dynamic at the whim of the domain administrator.

When looking at the consequence of a lapsed name, using the commented Pull Request as an example, it seems as though introducing a universe of dynamically generated entries introduces significant security issues that would need to be throught through.

aph3rson commented 3 days ago

Appreciate the response from the community on this so far. I wanted to drill in to a few points that have been raised:


@dnsguru:

comes front-loaded with obligations on the downline consumers of the PSL to implement something.

Ideally, this would not be the case. I fully agree that backwards-compatibility be the most-important aspect of the PSL, and we'd like to preserve the behavior of any existing PSL clients at-present.

This is part of the reason we suggested having the "resolution" performed within the existing GitHub Actions workflow that publishes the PSL artifact to publicsuffix.org. Existing clients could continue to pull the same PSL they're expecting, and clients interested in doing their own resolution could pull the artifact from elsewhere (e.g. artifacts from a GitHub release).

generate these sections into the PSL on some interval.

The biggest issue for us has been DNS verification. At any given time, we have a good idea which suffixes need to be on the PSL for a given service. We can assert our ownership over a higher-order domain that encompasses those services. However, the mechanisms to emit DNS verification records requires significant interaction from our service teams, and makes automation of this difficult.

if Amazon wanted to put resourcing towards aiding in our backlog debt as we get that all oranized it might be helpful to look at some of the DNS validation script work being done at the time of pull request reviews

We can look into if we can provide some support here.

One possible friendly amendment might be to alter the syntax suggested from lines that would start with an @ to instead be a specifically-formed comment prefix like // ++

I'm not personally in favor of a specialized comment syntax here, as the failure mode of not being able to parse that is "the comment is silently ignored." With a new type of prefix character, presumably a PSL library would know to fail (loudly) if an unrecognized syntax was seen.

an example being #1401 (comment) where the requestor let a name lapse, another party picked up the name, and there might be some security implications.

A very good point. Barring some kind of cryptographic signature mechanism on the authority’s list of suffixes (or perhaps some level of certificate pinning?), it may be tough to determine if the same party controls that domain at any given time.

This might be closer to the discussion of “what to do with stale entries?,” though, as I think this proposed record type is impacted in the same way as all other private members of the PSL.


@danderson:

Looking at the block of Amazon-managed domains in the PSL, the large majority fall within a few top-level domains, whose ownership is either well established, or could be established by some TBD one time process.

This is correct. The majority of our services operate under a domain that is per-partition (e.g. amazonaws.com, amazonaws.com.cn, etc.).

A somewhat-limited number of services have their own domain for customer resources, e.g. amazoncognito.com, awsapprunner.com, and so on. We generally find that these services can have well-wildcarded suffixes, whereas services using a per-partition zone are not able to do this.

A list of github users who are permitted to change suffixes

In #1605, we talked about attribution of our commits - mainly, that the submissions will come from a specific org/repository, and commits will be made by members of that org. We manage permissions on that repository closely. Perhaps that might be a better option than a static list of users?

enough safety checks like "are you adding two million suffixes to the PSL?"

We aim to avoid this if at all possible. I can say that the automatic addition of suffixes for new regions might cause the list to grow faster, but not at the two-million-suffixes-at-a-time rate.

Both of these proposals don't require downstream consumers to change anything

Our thought is that some consumers might want to do the resolution (or merging) themselves - in such cases, we figured the un-merged artifact would be helpful.


@simon-friedberger:

This seems to be an organizational issue and changing the rules for validation as @danderson suggests might be a solution here and would not require a change to the PSL format.

That's somewhat-correct. The major stipulation is that, sans DNS verification artifacts, we wouldn't have a lot to hand to the PSL maintainers when we submit our changes. It also increases the work associated with PSL maintainers when these changes (within an existing "authority") are proposed to the PSL.

Then, their customer at cust1.badger-tech.example wants to have that added to the PSL but cust2.badger-tech.example does not want to be on the PSL, how do they do it?

This is a good point. I don’t know how this might be handled at the moment. It would likely require a level of arrangement between BadgerTech and their customer, in this scenario. (In our example, I can’t think of a situation where we’d hand public-suffix-control to a customer, but the use-case might exist elsewhere.)

Is the concern here just the size of the PSL? Or would you like to add and remove entries without changing the PSL?

Yes, both. Specifically, we’re concerned about the size of the PSL source (not necessarily artifact). Many in the PSL community might look at the source on GitHub, and rely on the artifact to be pulled into a library/browser/other consumer.

Looking at for example this section

emrappui-prod.cn-north-1.amazonaws.com.cn emrnotebooks-prod.cn-north-1.amazonaws.com.cn

...

We cannot cut those suffixes down as-such. The suffixes provided are specific to the EMR service, and the regional suffixes are shared with many other AWS services in said region. We cannot announce all children of those as public suffixes, as it’s possible it may cause issues in a separate unrelated service. This was discussed in #1605, many of the AWS services fall into one of those zones, EMR is one (as is API Gateway, Cloud9, some portions of S3, and other services).

we haven't been doing that so far so DNS changes shouldn't really impact anyone.

This isn’t referring to the PSL’s DNS verification process at this point. Prior, there was a recommendation that AWS should reorganize their DNS records for in-scope services to more-closely-align with PSL best-practices (e.g. to support wildcarding) - reorganization in that fashion would be backwards-incompatible for AWS customers, and would break PSL consumers who try to use AWS resources.