jenkins-infra / helpdesk

Open your Infrastructure related issues here for the Jenkins project
https://github.com/jenkins-infra/helpdesk/issues/new/choose
17 stars 10 forks source link

Renew update center certificate (crawler and update-center builds failing on trusted.ci.jenkins.io) #2950

Closed dduportal closed 2 years ago

dduportal commented 2 years ago

Service(s)

trusted.ci.jenkins.io updates.jenkins.io

Summary

The certificate of the update center expires soon (14th of June 2022) and must be renewed for 1 year. It forbids updates of the update-center JSON and tools metadata ("crawler").

[initial message] The pipeline job crawler on trusted.ci.jenkins.io is failing on its principal branch since 5 days, which means that the tools metadata are not published since these 5 days:

It seems that there are no HTML files generated in the target/ directory while there should be.

Reproduction steps

No response

dduportal commented 2 years ago

As underlined by @daniel-beck (thanks!) the error is caused by a safety mechanism in the groovy script executed by this pipeline, which fail when the certificate expiration is coming soon:

[2022-05-21T12:20:29.237Z] Caught: java.io.IOException: Failed to create one or more output files.
[2022-05-21T12:20:29.237Z] java.io.IOException: Failed to create one or more output files.
[2022-05-21T12:20:29.237Z]  at dotnetSdk.addFailure(dotnetSdk.groovy:41)
[2022-05-21T12:20:29.237Z]  at dotnetSdk.run(dotnetSdk.groovy:61)
[2022-05-21T12:20:29.237Z]  at runner$_run_closure1.doCall(runner.groovy:13)
[2022-05-21T12:20:29.237Z]  at runner.run(runner.groovy:10)
[2022-05-21T12:20:29.237Z]  Suppressed: java.security.cert.CertificateExpiredException: NotAfter: Tue Jun 14 10:42:00 UTC 2022
dduportal commented 2 years ago

Hello @daniel-beck @olblak @timja could you help us understanding this whole "certificate almost expired to be rotated". It seems that it is a certificate somewhere in the update center code, but it's not clear which/what/how?

Searched the following issues from the past, but I honestly understand nothing to the involved components:

I don't see anything related tp the certificate in https://github.com/jenkins-infra/crawler.

What did I miss?

timja commented 2 years ago

It’ll be the update center signing certificate

smerle33 commented 2 years ago

this certificate impact also update-center within trusted-ci. This is now a TOP priority for the infra team.

dduportal commented 2 years ago

A bit more context to help us understand which certificate is which:

dduportal commented 2 years ago

So, as per https://github.com/jenkins-infra/update-center2/tree/master/resources/certificates#jenkins-update-center-root-ca-2's readme, only @oleg-nenashev @olblak and @kohsuke have the key of the (2021) new CA: without their help we won't be able to generate a new UC certificate.

dduportal commented 2 years ago

Thanks to the help of @olblak , whom generated a new update center certificate and uploaded the credentials to trusted, both builds are working again as expected.

ToDo list before closing:

dduportal commented 2 years ago

Filling the "plugin releases gap"

Following the (private link) procedure at https://github.com/jenkins-infra/runbooks/tree/main/updates, we got the list of plugins released since the past 72 hours (the update-center job fails since ~24 hours):

{"releases":[{"name":"azure-vm-agents","version":"815.vf2f07da070ee"},{"name":"cas-plugin","version":"1.6.2"},{"name":"checkmarx-ast-scanner","version":"2.0.11-274.va_d38ce3e7a_35"},{"name":"codescene","version":"1.5.7"},{"name":"dark-theme","version":"185.v276b_5a_8966a_e"},{"name":"ecutest","version":"2.34"},{"name":"eggplant-runner","version":"0.0.1.108.v32f1564b_19d0"},{"name":"elastic-axis","version":"1.6.0"},{"name":"influxdb","version":"3.2.1"},{"name":"jenkins-multijob-plugin","version":"611.v9d3180d752e6"},{"name":"jenkinsci-appspider-plugin","version":"1.0.15"},{"name":"jobConfigHistory","version":"1146.v94c2521f9213"},{"name":"junit","version":"1119.va_a_5e9068da_d7"},{"name":"junit-attachments","version":"101.v82f494a_00e9e"},{"name":"kubernetes","version":"3600.v144b_cd192ca_a_"},{"name":"opentelemetry","version":"2.7.1-rc2"},{"name":"report-info","version":"1.2"},{"name":"rest-list-parameter","version":"1.6.0"},{"name":"robot","version":"3.2.0"},{"name":"role-strategy","version":"488.v0634ce149b_8c"},{"name":"saml","version":"2.298.vc7a_2b_3958628"},{"name":"schedule-build","version":"301.vfdc555a_b_cf81"},{"name":"theme-manager","version":"1.4"},{"name":"theme-manager","version":"1.3"}]}

Still following the runbook procedure, the "sync script" on the update center VM was executed to make sure that all of these plugin releases are synchronized to the reference mirror, and are available through the get.jenkins.io URL.

Sanity checking: https://get.jenkins.io/plugins/azure-vm-agents/815.vf2f07da070ee/azure-vm-agents.hpi was HTTP/404 right before this operation, and is now available.

Reason: the update-center job is succeeding since today 07:24am UTC, but it only covers the past few hours, hence the gap.

dduportal commented 2 years ago

Email + IRC notification done

dduportal commented 2 years ago

Old secrets.zip credential removed in trusted.ci (I got a local encrypted backup)

dduportal commented 2 years ago
dduportal commented 2 years ago
dduportal commented 2 years ago

Closing: the event is in the jenkins-infra-team calendar

szjozsef commented 2 years ago

Some files are still signed with the old (expired certificates):

signature check failed for http://updates.jenkins.io/updates/hudson.plugins.groovy.GroovyInstaller.json
ERROR: Signature verification failed in downloadable &#039;hudson.plugins.groovy.GroovyInstaller&#039; <a href='#' class='showDetails'>(show details)</a><pre style='display:none'>java.security.cert.CertificateExpiredException: NotAfter: Tue Jun 14 13:42:00 EEST 2022<br>

also

signature check failed for https://updates.jenkins.io/updates/org.jenkinsci.plugins.scriptler.CentralScriptJsonCatalog.json
ERROR: Signature verification failed in downloadable &#039;org.jenkinsci.plugins.scriptler.CentralScriptJsonCatalog&#039; <a href='#' class='showDetails'>(show details)</a><pre style='display:none'>java.security.cert.CertificateExpiredException: NotAfter: Tue Jun 14 13:42:00 EEST 2022<br>
daniel-beck commented 2 years ago

The problem there is that these haven't been updated in a while (3 and 11 months respectively). Nothing to do with the cert, just a regular crawler failure.

Some other stuff was last updated 2017 🤷

dduportal commented 2 years ago

The problem there is that these haven't been updated in a while (3 and 11 months respectively). Nothing to do with the cert, just a regular crawler failure.

Some other stuff was last updated 2017 🤷

Would that be a problem if find a way to regenerate these?

timja commented 2 years ago

shouldn't be, often crawler depends on external websites which change their markup so it can get broken easily...

dduportal commented 2 years ago

🤔 can we, instead, re-sign them one time? (still a high level question, haven't checked how to do it technically yet)

timja commented 2 years ago

🤔 can we, instead, re-sign them one time? (still a high level question, haven't checked how to do it technically yet)

It's been done before, that's what KK did last time a bunch of them expired and he didn't have time to fix the scripts. I think he either replayed the jobs or hacked the scripts to resign the existing ones in some way

dduportal commented 2 years ago

Thanks folks. Keeping this issue open, adding to the current milestone so we'll track it.

timja commented 2 years ago

FYI every Jenkins instance will be complaining about this when it tries to check for updates

image

(It seems plugin updates still work at least)

lemeurherve commented 2 years ago

@dduportal and I think we found the issue, executing the trusted.ci job on only groovy.groovy (as it's one of the failing signs), we noticed an error when it try to fetch the html to retrieve the data:

12:00:44 + export JENKINS_SIGNER=-key /update-center.key -certificate /update-center.cert -root-certificate ****/jenkins-update-center-root-ca.crt 12:00:44 + groovy -Dgrape.config=./grapeConfig.xml ./lib/runner.groovy groovy.groovy 12:02:22 loading dependencies...done 12:02:22 Caught: com.gargoylesoftware.htmlunit.ScriptException: ReferenceError: "fetch" is not defined. (https://groovy.jfrog.io/ui/externals/systemjs/dist/s.min.js#1) 12:02:22 ======= EXCEPTION START ======== 12:02:22 EcmaError: lineNumber=[1] column=[0] lineSource=[] name=[ReferenceError] sourceName=[https://groovy.jfrog.io/ui/externals/systemjs/dist/s.min.js] message=[ReferenceError: "fetch" is not defined. (https://groovy.jfrog.io/ui/externals/systemjs/dist/s.min.js#1)] 12:02:22 com.gargoylesoftware.htmlunit.ScriptException: ReferenceError: "fetch" is not defined. (https://groovy.jfrog.io/ui/externals/systemjs/dist/s.min.js#1) 12:02:22 at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine$HtmlUnitContextAction.run(JavaScriptEngine.java:883)

So, for each of these errors, the script can't generate the json nor sign it, and thus there isn't any new version of these files to rsync.

daniel-beck commented 2 years ago

every Jenkins instance

… with affected plugins installed?

dduportal commented 2 years ago

We split the work, the faster would win :)

timja commented 2 years ago

… with affected plugins installed?

Not sure, weekly.ci.jenkins.io is affected and that doesn't have many plugins. (groovy plugin I think is there for some reason)

lemeurherve commented 2 years ago

every Jenkins instance

… with affected plugins installed?

By looking at the files dates in /var/www/updates.jenkins.io/updates, we can see the affected plugins:

-rw-rw-r-- 1 www-data www-data    8940 Jun  2  2017 hudson.plugins.flyway.FlywayInstaller.json.html
-rw-rw-r-- 1 www-data www-data   25462 Jun 25  2018 org.jenkinsci.plugins.perlinstaller.PerlInstaller.json.html
-rw-rw-r-- 1 www-data www-data   23641 Jul  5  2021 hudson.plugins.groovy.GroovyInstaller.json.html
-rw-rw-r-- 1 www-data www-data    8752 Jul 27  2021 io.jenkins.plugins.codeql.CodeQLInstaller.json.html
-rw-rw-r-- 1 www-data www-data   34669 Mar 12 12:24 org.jenkinsci.plugins.scriptler.CentralScriptJsonCatalog.json.html

@dduportal has prepared a signer.groovy script to sign them.

lemeurherve commented 2 years ago

We pushed the new signed files for them, it resolved the message error in Jenkins instances.

lemeurherve commented 2 years ago

Long term solution: fix all failing groovy scripts in https://github.com/jenkins-infra/crawler

dduportal commented 2 years ago

We used the script from https://github.com/jenkins-infra/crawler/pull/118 to manually re-generate all the metadata tools.

In the future, this script might be called on trusted.ci with a "replay" job

dduportal commented 2 years ago

Fix for the "groovy" tools installer: https://github.com/jenkins-infra/crawler/pull/117 . That should update the current list (blocked to 3.0.8 to 3.0.11): https://github.com/jenkins-infra/crawler/pull/117

dduportal commented 2 years ago

Closing the incident: all metadata files are now signed with latest as per our testing (and groovy was fixed).

Please feel free to reopen with details if you have any other error.