publicsuffix / list

The Public Suffix List
https://publicsuffix.org/
Mozilla Public License 2.0
2.08k stars 1.23k forks source link

Update Autopull from ICANN JSON to address contract renewal noise #1806

Closed dnsguru closed 1 year ago

dnsguru commented 1 year ago

I have a (hopefully) fast and easy update to the autopull @cpu

Please See PR #1805 and look at the code change being automated.

The very first of 10 year contract renewals between registry operators and ICANN are being reflected in the json as a reset contract date, and this is causing autopull noise. Can the autopull have some logic that ignores a contract date change if there is no change in the sponsoring entity name?

We are now at one decade from some of the first contracts being signed between ICANN and the registry operators that applied for and successfully received delegations to operate registries that they applied for in the 2012 round of new gTLDs. So we'll have a lot of noise come through as all of these will have date changes as they renew contracts.

The 99.8% majority, but not all, of the new TLDs from the ICANN 2012 new gTLD round (give or take .web and/or some others that are taking longer to sort out stuff) have been delegated.

We'd want to keep the autopull going as it has been helpful for quickly catching changes in sponsoring organization per TLD or where there are applicants sunsetting their aspirations, but it is believed that at whatever time in the future that the next next round opens and delivers TLDs, that the current automation process would be able to handle it.

As a background, we used the contracting date to front-load the addition of TLDs before they were being added to the root by the IANA. The contracting date would precede whatever pre-delegation testing period that would occur before the TLD would be added to the root zone.

To address the time-lag for browsers' (and others') incorporation of the PSL as a static resource within code creating a situation where a gTLD would get added to the root but work on some browsers and not others based upon the timing of the last snapshot taken, there was a choice to use the date which the registry operator and ICANN had entered into a contract as a 'trajectory signal' - that the TLD would be being added to the root within a reasonably short timeframe.

We would still want that to be happening, but just want to make this little tweak on the date changes to mute a flood of autopulls that would have no practical functional value.

cpu commented 1 year ago

Just wanting to note that I saw the tag here and will take a look when time permits. Thanks!

dnsguru commented 1 year ago

Thanks Daniel - hopefully this is trivial to ignore the DateOfContractSignature string if it is the only change in list/tools/newgtlds.go per entry.

I will leave the #1805 in an open state so that testing is possible, but I suspect we'll get daily nudges

dnsguru commented 1 year ago

ALTERNATIVE: If the if adding in logic on the commented org changing or not TLD is complex, there is a brute force option here - I wonder if we just pull out the contract date altogether on these. Will make for one huge pull request because all nTLD will need an update to remove the dates across all the nTLDs, but subsequently we won't have week to week noise as the 2012-round nTLDs re-contract.

We'd included them in order for folks to track when they were introduced to the contracting while awaiting the full roster of 2012-round TLDs to hit the root zone due to the Root Zone Operators wanting a cadence not to exceed 20/week... we're well past all of the considerations and future proofing stuff that were cautiously added, and likely not to see more TLDs added until the next round might open up.

dnsguru commented 1 year ago

ADDITIONALLY: I have contacted the CTO's office at ICANN to request that the contract renewal date be a separate field in the json file.

IF they make that change, we need do nothing but close the open pull request(s) that were date-only and the automation can remain as-is.

cpu commented 1 year ago

hopefully this is trivial to ignore the DateOfContractSignature string if it is the only change in list/tools/newgtlds.go per entry.

This is not impossible, but also not trivial for reason I previously called out in this comment: The existing tooling blindly replaces the whole section each time it runs. To be able to say "is the only thing that changed the DateOfContractSignature" it would need to actually parse the old content first.

Further, using github.com/weppos/publicsuffix-go to parse the PSL DAT as I suggested in #1325 won't help with this particular request because its representation (sensibly) does not include comments.

I wonder if we just pull out the contract date altogether on these.

Assuming I understand you correctly this would probably only require a small tweak to pslEntry.Comment to not render the contract date.

As a meta-comment: is there consensus from the other maintainers (is that just @weppos these days?) about the nature of this problem and the proposed solutions?

Perhaps I'm underestimating the scale of the "noise" but it seems relatively trivial to review the pull requests that are only updating the date of contract signature. That requires no code changes, no review of the new code, and would keep the contract date information available and accessible to consumers.

dnsguru commented 1 year ago

noise ...underestimating...

The noise I am hoping we avoid is where we will get pull resuests generated each time a contract renews, and there is at or about 1000 to contend with over the next three years.

I made the spec for the date inclusion initially in collaboration with ICANN for our automation. We had the objective of using these commented dates as "carbon rings" in measuring the lag time before a browser or other integrator updated their snapshot, because there were TLDs launching with some consistent cadence and frequency.

Now that the majority of nTLD are processed from the 2012 round, until subsequent TLD round(s), the date is likely not practically beneficial.

So maybe just removing the date from being included in the comment might work as a solution.

Perhaps just commenting the code that includes the date in the header for now, and we revisit uncommenting it if and when such a subsequent ICANN round opens and starts delegations.

We will have the versioning record that the github repo currently provides as date-stamp / forensic assistance in troubleshooting propogation delay.

There are no known or estimated uses of the date where there would be adverse affectation, and we cite the URL for the source json file to be retrieved from ICANN directly.

Because this information is in the commented information, I believe the only impact is that I get to update the wiki documentation.

I am glad to trade that labor for having to merge tons of comment-only pull requests to keep up with this.

is that just @weppos https://github.com/weppos these days?

Yeah, its down to me and Weppos, and I am doing the majority of it since 2020, but we have a few folks from Mozilla and Google hopefully stepping up to help, as well as some security review from Cleandns.com, sorbl/spamhaus upon request, and some other automation to reduce the review steps that can reasonably be automated.

dnsguru commented 1 year ago

What about just setting the contract date to the string "in contract " on line 121?

Or a blank string ""

dnsguru commented 1 year ago

@cpu I made a 'caveman' pull request - which would result in just putting the word "Contracted" in place of the contractdate field.

Temporary solution while I have a request in to ICANN.org technical services to see about them using a different field for renewals.

weppos commented 1 year ago

Alternative implementation https://github.com/publicsuffix/list/pull/1812

weppos commented 1 year ago

I reviewed both PRs and I'm inclined to prefer https://github.com/publicsuffix/list/pull/1812 approach to skip the value altogether. I can go ahead, merge it and monitor the processing.

Let me know if you are good with it @dnsguru .

@cpu thanks for finding the time to write the PR. I know you're not actively involved with LE anymore and you probably have better things to spend your time with. I value your time, and I really appreciate you invested some of it to help here this time. 🙏

dnsguru commented 1 year ago

In scrolling through and looking at the effect of this change and how the new format of entries appear in the context of everything else in the PSL, I wonder if removing the date was the right choice vs replacing it with the URL at IANA per TLD?

// aaa : 2015-02-26 American Automobile Association, Inc.
aaa
// aaa : American Automobile Association, Inc.
// https://www.iana.org/domains/root/db/aaa.html
aaa

Using existing variables, we could generate it into another comment line, and the result would be more consistent with the overall format, and be in accordance with the guidelines.

dnsguru commented 1 year ago

Also, I would like us to consider and implement the change in automation that adds the labelling that I'd proposed in PR #1815

weppos commented 1 year ago

@dnsguru to be clear, what you are suggesting regarding the template is:

  1. Remove the date (as already done)
  2. Add a second line to the template with the IANA url

Is it correct? if my understanding is correct, https://github.com/publicsuffix/list/pull/1816 could be merged in the meanwhile as it does implement (1). I can then provide a follow up PR for (2).

Also, I would like us to consider and implement the change in automation that adds the labelling that I'd proposed in PR #1815

It must be me, but I've read it twice and I'm not sure I understood the second part of the request:

Did not see a way to also automate the project as well in the automation handler, but this will save volunteers steps in labelling when reviewing.

Is that part of the PR? Or would it be a "feature request" for a change. I guess I'm confused as that PR bring some changes, and then also brings requests for more changes.

cpu commented 1 year ago
  1. Add a second line to the template with the IANA url

https://github.com/publicsuffix/list/pull/1817

cpu commented 1 year ago

I think in the future it'd be better to revert commits like https://github.com/publicsuffix/list/commit/d751aa53683cb2dbfa3f6c1cd6396d88c937dd90 instead of commenting out the code in follow-ups: https://github.com/publicsuffix/list/commit/46fd989374ae9be670d998cb6d9dccf3b0985a9f

It would also be prudent to run the unit tests for your changes (especially if avoiding pull requests). They would have caught this error before it was merged.

dnsguru commented 1 year ago

100% agree - I had meant to be doing that as a PR and did a direct commit erroneusly.

I left the desired result comments in there as well, but I have better perl/PHP/Python skills than with go, and wanted to ask if you might fix that commented line.

-Jothan

On Thu, Aug 3, 2023, 6:13 AM Daniel McCarney @.***> wrote:

I think in the future it'd be better to revert commits like d751aa5 https://github.com/publicsuffix/list/commit/d751aa53683cb2dbfa3f6c1cd6396d88c937dd90 instead of commenting out the code in follow-ups: 46fd989 https://github.com/publicsuffix/list/commit/46fd989374ae9be670d998cb6d9dccf3b0985a9f

It would also be prudent to run the unit tests for your changes (especially if avoiding pull requests). They would have caught this error before it was merged.

— Reply to this email directly, view it on GitHub https://github.com/publicsuffix/list/issues/1806#issuecomment-1663962182, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACQTJKLAOKMO67ATXMTQ7LXTOPWXANCNFSM6AAAAAA2RZEKUY . You are receiving this because you were mentioned.Message ID: @.***>

cpu commented 1 year ago

Turns out a PR might not have caught the test failures anyway :sweat: The workflow was only being triggered by push events, fix in https://github.com/publicsuffix/list/pull/1818.

dnsguru commented 1 year ago

You rock, once again!

On Thu, Aug 3, 2023, 6:53 AM Daniel McCarney @.***> wrote:

Turns out a PR might not have caught the test failures anyway 😓 The workflow was only being triggered by push events, fix in #1818 https://github.com/publicsuffix/list/pull/1818.

— Reply to this email directly, view it on GitHub https://github.com/publicsuffix/list/issues/1806#issuecomment-1664025662, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACQTJLO2FNPTMVVMRBMIJTXTOUMRANCNFSM6AAAAAA2RZEKUY . You are receiving this because you were mentioned.Message ID: @.***>

dnsguru commented 1 year ago

@mozfreddyb This is a change to the comment lines for the nTLDs and the formatting that happens from the automation. It is anticipated that the impact for most all integrators of the PSL is going to be nil

dnsguru commented 1 year ago

closing this as I believe we have sorted out the concern about the impacts on the automation.