Closed jku closed 4 years ago
TUF updater abstraction in pip code
This is code in src/pip/_internal/network/tuf.py. The code badly needs better naming ('Updater' and 'tuf' names are used very confusingly) -- ideas are welcome.
Not perfect, but UpdaterHandler
? I am using the same pattern when I manage the server processes in the tests in TUF server_handler
as a variable name.
Or maybe TufUpdater
?
For dummies (I tried the mock deployment in a new virtualenv
):
pip install securesystemslib[colors,crypto,pynacl] tuf
at the very beginning or at least tuf
and crypto
, I didn't really try that#!/usr/bin/env python
but it worked like a charm (I think)
p.s. I know about it and I was still scared by the red text:
ERROR: Could not download URL: 'http://localhost:8000/tuf/3.root.json'
Traceback (most recent call last):
...
tuf.exceptions.NoWorkingMirrorError: No working mirror was found:
'localhost:8000': HTTPError('404 Client Error: File not found for url: http://localhost:8000/tuf/3.root.json')
Foreword
This branch is very much a work in progress (full 10% of the lines are "TODO"): please don't review details, I'm just hoping to validate (or even just communicate) the high level ideas and maybe get some new insights at that level.
My current work is a little ahead of this branch but I think this is more useful for the purposes of discussion and this branch actually works (for pip install at least)...
I don't expect you to do this but if you do want to test:
Normal flow of the tuf-related code in "pip install sampleproject"
SessionCommandMixin
).LinkCollector._get_html_page()
, this looks up an updater object based on the index url (currently quite unsafely), downloads the index file with tuf and returns the contentsget_http_url()
is called. This looks up an updater object based on "comes_from" field (which is the url of the index file this distribution url was found in), and downloads the target this url refers toOpen questions on the flow
updaters are looked up with index_urls. If one is not found, that means TUF is not used for this download: instead the current download functionality (without TUF) is used. This feels fragile considering I don't have full knowledge of where the index urls come from... but I don't see other solutions.
Where to do the initialization is undecided: I think one of the CommandMixins is correct, possibly even a new one
There are loads of possibilities for when to "intercept" the index and distribution download code: the current places are the easiest but the decision should probably be based on what is least likely to break in future (so TUF support does not get accidentally turned off)
with the previous point in mind, I'm thinking I'll add a hard-coded warning/error for pypi.org: if we end up downloading things from pypi without TUF, that sounds like an error. I'm not sure same can be done for files.pythonhosted.org
Data storage
Cache is in ~/.cache/pip/. It's used as the tuf download location so contains everything ever downloaded with tuf
TUF metadata is in ~/.local/share/pip/.
Open questions on data storage
TUF updater abstraction in pip code
This is code in src/pip/_internal/network/tuf.py. The code badly needs better naming ('Updater' and 'tuf' names are used very confusingly) -- ideas are welcome.
But the basic design is simple:
So a user will first lookup the correct updater using the index_url of the repository, then call the download functions on that updater.
Open questions: