sigstore / root-signing-staging

Staging TUF repository for Sigstore trust root
https://tuf-repo-cdn.sigstage.dev/
Apache License 2.0
3 stars 5 forks source link

CDN cache invalidation issue #84

Closed jku closed 3 months ago

jku commented 3 months ago

we run gcloud compute url-maps invalidate-cdn-cache tuf-repo-cdn-lb --path "/*" --async after uploading content to GCS.

There are two issues that seem to lead to https://github.com/sigstore/root-signing-staging/issues/83:

  1. We should likely not use --async: we are explicictly trying to run tests afterwards that assume the CDN serves the new files
  2. Originally I thought --async is also hiding an authentication error: the log just says this:

    Invalidation pending for [https://www.googleapis.com/compute/v1/projects/projectsigstore-staging/global/urlMaps/tuf-repo-cdn-lb]
    Monitor its progress at [https://www.googleapis.com/compute/v1/projects/projectsigstore-staging/global/operations/operation-1711626671146-614b723b8c66d-193e686c-73761af4]

    Seeing these requires authentication: I can see the first links content with my sigstore.dev account but not the second so I can't see if there's a log or not

    Production does not have this issue because it does not test the published repository at all I believe

jku commented 3 months ago

Ok, found my way to the logs with authentication:

{
  "description": "/*",
  "endTime": "2024-03-28T05:01:19.296-07:00",
  "id": "4938490834697381696",
  "insertTime": "2024-03-28T04:51:11.617-07:00",
  "kind": "compute#operation",
  "name": "operation-1711626671146-614b723b8c66d-193e686c-73761af4",
  "operationType": "invalidateCache",
  "progress": 100,
  "selfLink": "https://www.googleapis.com/compute/v1/projects/projectsigstore-staging/global/operations/operation-1711626671146-614b723b8c66d-193e686c-73761af4",
  "startTime": "2024-03-28T04:51:11.618-07:00",
  "status": "DONE",
  "targetId": "8429254431158848030",
  "targetLink": "https://www.googleapis.com/compute/v1/projects/projectsigstore-staging/global/urlMaps/tuf-repo-cdn-lb",
  "user": "tuf-gha@projectsigstore-staging.iam.gserviceaccount.com"
}

starttime: 04:51:11 endtime: 05:01:19

That's ten minutes...

jku commented 3 months ago

Notes on potential configuration to look at:

jku commented 3 months ago

oh and one more observation: while we can't expect every TUF client to support etag when downloading timestamp.json from CDN, I think CDN could use etag while getting the content from GCS (I don't know if it does, probably not)

GCS does support etag, I don't yet know if CDN uses it:

   $ curl --head --header 'If-None-Match: "34cb3234238bfdda662e88bec78df231"' https://storage.googleapis.com/tuf-root-staging/timestamp.json | head -1
   HTTP/2 304
haydentherapper commented 3 months ago

As another data point, the CDN should not be caching timestamp.json (and also root.json, targets, snapshot, and registry.npmjs.org.json, the non-versioned files). I don't recall why this was not sufficient without needing to invalidate the cache. Perhaps it was the newer versioned files not being immediately available without invalidation?

Edit: Ah I think I found why. It's because we had an alert on expiring metadata that we wanted to resolve immediately, as discussed in https://github.com/sigstore/public-good-instance/pull/1683#issuecomment-1736433870 and https://github.com/sigstore/public-good-instance/issues/1684.

jku commented 3 months ago

Perhaps it was the newer versioned files not being immediately available without invalidation?

That does make sense -- basically we either

jku commented 3 months ago

Ah I think I found why. It's because we had an alert on expiring metadata that we wanted to resolve immediately, as discussed in sigstore/public-good-instance#1683 (comment) and sigstore/public-good-instance#1684.

I think this should not be very relevant anymore unless the 604800 TTL actually applies to 404s (which feels like it can't be true)