theupdateframework / python-tuf

Python reference implementation of The Update Framework (TUF)
https://theupdateframework.com/
Apache License 2.0
1.63k stars 272 forks source link

Prototype support for content addressable systems such as IPFS #2325

Open adityasaky opened 1 year ago

adityasaky commented 1 year ago

NOTE: This ticket is for a potential GSoC 2023 task.

TUF’s specification was written with artifacts stored in traditional file systems in mind. As such, it specifies explicitly how artifacts must be hashed in order to guarantee their integrity. Since TUF was first created, however, content addressable systems for storage and data transmission have become more prominent. Some examples of these systems are Git, the InterPlanetary File System (IPFS), and OSTree. All of these can present a file-like interface for artifacts they store, and have built-in mechanisms for ensuring the integrity of artifacts. When TUF is used with these systems, it is redundant for it to also ensure artifact integrity. Instead, TUF can delegate these guarantees to the underlying content addressable system, and focus on higher level security properties the specification provides. As part of this GSoC project, the participant will add support to an existing TUF implementation to delegate artifact integrity verification to the underlying content addressable system, specifically IPFS.

Also see: https://github.com/theupdateframework/taps/pull/156

Primary Goal

Allow delegating just the black-box targets to the content-addressing system. This is what our current draft TAP, https://github.com/theupdateframework/taps/pull/156, specifies. This is less invasive since, as stated above, targets are already black-box data to the rest of TUF. The draft TAP is pretty agnostic to which mechanism is used --- the examples of Git, IPFS, and OSTree above are taken for example. And given the black-box nature of targets, we think this the correct choice. The GSOC mentee is welcome to aim for support with just one or multiple of those with their prototype implementation.

Stretch Goal

TBD/WIP

GSoC Mentors

If accepted, this task will be mentored by myself (@adityasaky), John Ericson (@Ericson2314), and Marina Moore (@mnm678). This ticket was authored by all of us.

pandyasio commented 1 year ago

Hi, I am interested in working on this project and applying for GSoC 2023. How can I contact you?

adityasaky commented 1 year ago

This task has been assigned to @shubham4443. @mnm678 would it be possible to assign it to him formally on the issue?

mnm678 commented 1 year ago

@shubham4443 if you add a comment here I can assign you (Github limits assignees to folks who have commented or have permission in the repo)

shubham4443 commented 1 year ago

@mnm678 Adding a comment.

jku commented 1 year ago

Just thinking out loud here: The seeming difficulty in properly integrating IPFS (and the fact that the uses cases in the TAP seem so different from each other from an implementation perspective) leads me to wonder whether it makes sense for python-tuf to handle the download at all. The whole point of TAP-19 seems to be that the TUF library no longer manages integrity, only the correct delegation... so why would we go through the trouble of abstracting the concept of "download a thing" for all of {http,ipfs,git,ostree}?

What if the application that uses python-tuf just worked like this instead:

updater = tuf.ngclient.Updater(...)

if not updater.get_targetinfo(targetpath)
    raise RuntimeError("oops, target not found")

# tuf has now confirmed the targetpath is signed by the correctly delegated role: we can download
response = requests.get(gateway_url + parse_cid(targetpath), timeout=5)

I can see a couple of possible issues:

jku commented 1 year ago

What if the application that uses python-tuf just worked like this instead:

updater = tuf.ngclient.Updater(...)

if not updater.get_targetinfo(targetpath)
    raise RuntimeError("oops, target not found")

# tuf has now confirmed the targetpath is signed by the correctly delegated role: we can download
response = requests.get(gateway_url + parse_cid(targetpath), timeout=5)

or as another option: A small python-tuf-ipfs library implements a downloader client library with a nice IPFS specific API that just uses python-tuf like above

Ericson2314 commented 1 year ago

The stretch goal up in the original post is content-addressing the metadata. I finally found some time this morning, and clarified and wrote down my thoughts in https://github.com/Ericson2314/tuf-content-addressing-notes. I would be more than happy to transfer that repo to this org / otherwise make it a collaboration!

@adityasaky in https://github.com/theupdateframework/python-tuf/pull/2415#issuecomment-1619257835 you wrote:

@Ericson2314 I'm not sure if this is practical, though it depends on "root" in your message. Do you mean we remove the snapshot role and have the timestamp role identify the IPFS root node that contains the current set of all TUF metadata?

Yes I was very unclear/thoughts have baked. The tl;dr of the notes above is:

shubham4443 commented 1 year ago

Prototype can now be found here - https://github.com/theupdateframework/tap19-ipfs-poc