theupdateframework / python-tuf

Python reference implementation of The Update Framework (TUF)
https://theupdateframework.com/
Apache License 2.0
1.62k stars 270 forks source link

ngclient: support `StorageBackendInterface`? #2676

Open woodruffw opened 2 months ago

woodruffw commented 2 months ago

Description of issue or feature request:

Right now, tuf.ngclient is heavily tied to local system I/O: it assumes a metadata directory on disk that can be read/written. For example:

https://github.com/theupdateframework/python-tuf/blob/4d2ff8d37d30e94dbc0fe2cfa42bd46d2bb72414/tuf/ngclient/updater.py#L293-L312

This is problematic in distributed worker setups like Warehouse (PyPI), where each worker has its own container/entire VM and thus can't easily share on-disk TUF repos. In particular, this causes both reliability and security concerns:

This problem was noted a few years back, before tuf.ngclient was created: https://github.com/theupdateframework/python-tuf/issues/1009. The solution then was to add a filesystem abstraction to the tuf.metadata APIs, which was done via https://github.com/secure-systems-lab/securesystemslib/pull/232 and https://github.com/theupdateframework/python-tuf/issues/1009. However, this abstraction wasn't added to the ngclient APIs, only to the low-level metadata ones.

Current behavior:

tuf.ngclient currently assumes that it can perform persistent local I/O for its repository.

Expected behavior:

tuf.ngclient should support an I/O abstraction (such as the pre-existing StorageBackendInterface, if suitable) for persistent repo operations, enabling use in distributed deployments.

jku commented 1 week ago

I think the expected behaviour sounds reasonable.

There is a related question to consider -- in a scenario where you have "distributed workers", maybe what you really want is a bunch of "read-only" workers that operate without ever connecting to the repository (at least for metadata), and one writing tuf client that actually does the updates at regular intervals.

Previously we tried to make an offline mode that would be use friendly -- usable by CLI apps -- and that turned out complicated (compared to the potential advantages). The "offline mode" described above (where it's ok to just immediately fail if the local metadata is not up-to-date and someone promises to keep it updated) would be simple to add.

"dumb read-only mode" or IO abstraction (or both) sound like things that could be added as optional features to ngclient.

woodruffw commented 1 week ago

There is a related question to consider -- in a scenario where you have "distributed workers", maybe what you really want is a bunch of "read-only" workers that operate without ever connecting to the repository (at least for metadata), and one writing tuf client that actually does the updates at regular intervals

Thanks for extrapolating this! This is indeed the underlying scenario, and probably is a more accurate encapsulation of what I actually need 🙂