theupdateframework / python-tuf

Python reference implementation of The Update Framework (TUF)
https://theupdateframework.com/
Apache License 2.0
1.6k stars 267 forks source link

possible blog post: Caching TUF metadata #2605

Open jku opened 2 months ago

jku commented 2 months ago

I've spent far too long in the past week looking at CDN logs... I collected some notes from this, and wrote a first draft of a blog post or something. copy-pasting here so I don't lose it

TUF implementation details: Caching and content delivery networks

TUF metadata can be cached at various places during its lifetime, this post aims to describe the useful methods of caching. The write-up assumes that the "consistent snapshot" feature of TUF is used: this should be true for all reasonable implementations.

Client metadata cache

A TUF client stores downloaded metadata in an application cache as part of the TUF Client Workflow. Note that caching metadata is subtly different from caching artifacts: An artifact cache is a "pure" cache and can be purged at any time without side-effects (other than possibly having to re-download). Purging the metadata cache is also possible without service loss but does have minor security implications as some rollback attack protection is lost.

Client HTTP cache

In addition to the actual metadata, a client could cache the ETag information included in a timestamp.json response and use the If-None-Match header in subsequent requests. This is not useful for other metadata or artifacts as they should never change.

There is a minor information leak if this is done (as the server could now respond maliciously to only some clients based on the content of the If-None-Match header). Current client implementations are not known to cache ETag.

Content Delivery Network caching

One could imagine that caching something as simple as TUF metadata in a Content Delivery Network (CDN) is a trivial feat but it turns out there are several pitfalls.

These are some of the lessons that have been learned while maintaining TUF repositories:

At first glance it may seem like the above advice is overly cautious, and that failures would be rare. In practice especially testing and alerting systems have managed to consistently find failing combinations of mistakenly cached content.

JustinCappos commented 2 months ago

This seems like an interesting blog post.

I'm curious why updating targets / targets metadata and then snapshot doesn't work? My understanding is that users will not know how to reach new targets until they have the new targets metadata, which they will not know exists until they get a new snapshot file.

On Fri, Apr 12, 2024 at 7:42 AM Jussi Kukkonen @.***> wrote:

I've spent far too long in the past week looking at CDN logs... I collected some notes from this, and wrote a first draft of a blog post or something. copy-pasting here so I don't lose it TUF implementation details: Caching and content delivery networks

TUF metadata can be cached at various places during its lifetime, this post aims to describe the useful methods of caching. The write-up assumes that the "consistent snapshot" feature of TUF is used: this should be true for all reasonable implementations. Client metadata cache

A TUF client stores downloaded metadata in an application cache as part of the TUF Client Workflow. Note that caching metadata is subtly different from caching artifacts: An artifact cache is a "pure" cache and can be purged at any time without side-effects (other than possibly having to re-download). Purging the metadata cache is also possible without service loss but does have minor security implications as some rollback attack protection is lost. Client HTTP cache

In addition to the actual metadata, a client could cache the ETag information included in a timestamp.json response and use the If-None-Match header in subsequent requests. This is not useful for other metadata or artifacts as they should never change. Content Delivery Network caching

One could imagine that caching something as simple as TUF metadata in a Content Delivery Network (CDN) is a trivial feat but it turns out there are several pitfalls.

These are some of the lessons that have been learned while maintaining TUF repositories:

  • Uploading a new repository version to backend storage should be atomic (the metadata versions on the storage backend should always be consistent). If this is not technically possible, snapshot and all targets metadata should be uploaded before root and timestamp: this minimizes the window of potentially inconsistent metadata.
  • CDN frontends may cache versioned metadata responses (root, snapshot, targets) with long lifetimes.
  • CDN frontends should avoid serving stale content: TUF requires even 404 responses to not be stale.
  • There are two valid alternatives to caching other responses:
    1. CDN frontend may use "negative cache" (caching failure codes) and may cache timestamp metadata responses, if it is able to invalidate the cache immediately on upload of new repository versions to storage backend.
    2. CDN frontend should not cache timestamp metadata responses or use "negative caching" if it is unable to invalidate the cache on upload of new data

At first glance it may seem like the above advice is overly cautious, and that failures would be rare. In practice especially testing and alerting systems have managed to consistently find failing combinations of mistakenly cached content.

— Reply to this email directly, view it on GitHub https://github.com/theupdateframework/python-tuf/issues/2605, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAGROD53S66IHKGQCQYM2ETY47CBHAVCNFSM6AAAAABGD7SJA6VHI2DSMVQWIX3LMV43ASLTON2WKOZSGIZTSOJVGA4DMMA . You are receiving this because you are subscribed to this thread.Message ID: @.***>

jku commented 2 months ago

I'm curious why updating targets / targets metadata and then snapshot doesn't work?

Everything after timestamp does work in both the upload order and pretty much all of the caching details: the issue has practically always been with N.root.json and timestamp.json (since timestamp determines the snapshot+targets versions the actual failure may appear while loading other metadata of course).

What this means is that the issues only appear to real users during key changes in root (when user may end up with old root that contains old keys but also new other metadata signed with new keys; or vice versa). In testing and alerting systems the problems appear more easily (e.g. if the system expects specific metadata versions)

lukpueh commented 2 months ago

the issue has practically always been with N.root.json and timestamp.json

FYI:

There is a TUF spec issue about the race between root and timestamp: https://github.com/theupdateframework/specification/issues/223

And there is a request to create deployment recommendations for these findings: https://github.com/theupdateframework/specification/issues/91