sigstore / sget-rs

sget is a keyless safe script retrieval and execution tool
Apache License 2.0
18 stars 13 forks source link

Design: Generic purl retrieval #50

Open dlorenc opened 2 years ago

dlorenc commented 2 years ago

Description

OCI registries are great - but right now they're not completely universal no matter how much I wish :) I think over time we could try to support safely retrieving artifacts from any artifact manager.

The most widely used way to declare package locations is with PURL, and these are used in SBOMs and other tooling. They're also found in slsa/in-toto provenance.

PURLs aim to contain enough information to locate and retrieve a package, so I think it should be sufficient for sget to... locate and retrieve a package!

I'm not aware of other efforts to universally fetch anything in PURL, but it would definitely be useful.

lukehinds commented 2 years ago

I need to read up on purl. We had discussed using other storage systems after talking to someone curl bashing. They were a little hesitant as the prospect of using OCI, as they would prefer their users to pull from git direct (raw githubusercontent) and there not be a chance of race condition between the source code being updated and then pushed to the registry.

dlorenc commented 2 years ago

PURL isn't perfect, but it's pretty good and the community is active and willing to address feedback. They support git URLs and generic HTTP fetching, which should work for the generic support: https://github.com/package-url/purl-spec/blob/master/PURL-TYPES.rst

In general the PURL types try to support fully pinned references where the ecosystem permits. This is where sget can shine, by encouraging the safe patterns of fetching content from purls as much as possible!

imjasonh commented 2 years ago

+1 from me, and seemingly overlaps with some of the thinking in https://github.com/sigstore/sget/issues/44#issuecomment-995658840 -- tl;dr: requiring the script to be in a registry will be a barrier to adoption for maintainers who already have the script available from raw.githubusercontent.com today.

shibumi commented 2 years ago

I had a deeper look on the package url specification over the last few days. The generic type is currently missing a parameter for signatures, hence I proposed the addition of such a parameter: https://github.com/package-url/purl-spec/pull/145

Furthermore, I am currently working on the package url Go implementation: https://github.com/package-url/packageurl-go/pull/28

I am aware that this is the Rust implementation for sget, but I am confident this might be useful for further sget rust development.

One "problem" I have with package url is that it feels not naturally. Just have a look at this example here. The latest version of sget behaves like this:

$ sget us.gcr.io/dlorenc- vmtest2/readme@sha256:4aa3054270f7a70b4528f2064ee90961788e1e1518703592ae4463de3b889dec

With package url support the above line would become this:

$ sget pkg:oci/readme@sha256:4aa3054270f7a70b4528f2064ee90961788e1e1518703592ae4463de3b889dec?repository_url=us.gcr.io/dlorenc-vmtest2/readme&tag=latest

The generic package url would be even more complex:

$ sget pkg:generic/openssl@1.1.10g?download_url=https://openssl.org/source/openssl-1.1.0g.tar.gz&signature_url=https://openssl.org/source/openssl-1.1.0g.tar.gz.asc&checksum=sha256:de4d501267da

This may be just my subjective feeling, but I am afraid that the package url might make using sget tremendously more difficult. I see also issues with blob distribution at the moment. Some websites sign their checksum files instead of the blob artifacts and other sign their blob artifacts... sget should support both and I am not sure if the generic pkg url could support both in the current state.

Don't get me wrong, I am not against package url, I just have the feeling that we either need very good documentation for this, or something that generates package urls or software developers provide a copy-pastable sget string on their website. On the other hand: We want to create a new standard for downloading artifacts securely.. this might be the moment where we can do such decisions :)

Cross-Linking my PR for the Go implementation: https://github.com/sigstore/cosign/pull/1190

lukehinds commented 2 years ago

yep, those URLs are overly complex (and as per for the twitter thread) wide open for typo / bidi trojan attacks. I don't have anything substantial to offer around a better implementation, but going to spend some time thinking this over.

shibumi commented 2 years ago

sget should support both and I am not sure if the generic pkg url could support both in the current state.

Shall it? or do we want to "enforce" a standard? If so, we might want to discuss this as well...

shibumi commented 2 years ago

@lukehinds what are bidi attacks exactly? Can you share some link about it? I think we cannot do that much about typos in URLs, right?

lukehinds commented 2 years ago

bidi trojan attacks

sure, https://threatpost.com/trojan-source-invisible-bugs-source-code/175891/

the paper: https://trojansource.codes/trojan-source.pdf

shibumi commented 2 years ago

the paper: https://trojansource.codes/trojan-source.pdf

Ah okay. Bidi attacks are encoding attacks, for example in source code. I read about that earlier, I just didn't know the term for it.

I don't have an answer for a proper protection against such an attack.

lukehinds commented 2 years ago

I don't have an answer for a proper protection against such an attack.

No one does right now, so we should not let it hold us back from trying to solve the issue at hand.

luhring commented 2 years ago

One "problem" I have with package url is that it feels not naturally

I have a thought along the same lines. I think supporting purl might not be a bad idea, but I think more thought is needed on what kind of user we'd intend to benefit from having this support in sget.

I think the folks that would benefit from purl support are the people already using purls in their workflows. If neither a software's users nor its maintainers are familiar with purls, I'd want to understand why they'd need to learn about purls in order to use sget.

For the folks that are already familiar with purls, there are some neat applications. For example, if an SBOM had a purl for each listed software package, the SBOM could be used as an input to sget, to reinstall all software into a fresh environment / image.

IMHO, software publishers are already publishing their software to a place, and for them to adopt sget, they want to use the minimum amount of thought and information needed to point to that place. For example, a lot of software gets published to GitHub Releases. A lot gets published to S3 and similar object storage backends. Installation scrips also appear in the projects' git repo. What's the easiest message for software maintainers to tell their users about how to access this software in a verified manner?

Also, to oversimplify, there are two kinds of packages that purls describe: executable software and software libraries. Do we expect language ecosystems to use sget in place of their ecosystem-specific tooling (e.g. npm) to obtain libraries? My assumption has been "no", that we were focusing on ways for users to install runnable software. Is that correct?

shibumi commented 2 years ago

Do we expect language ecosystems to use sget in place of their ecosystem-specific tooling (e.g. npm) to obtain libraries? My assumption has been "no", that we were focusing on ways for users to install runnable software. Is that correct?

This is my understanding, too. Everything else would lead to reimplementing existing package managers with sget.

dlorenc commented 2 years ago

I agree with all of this! Purls are way too clumsy to work with by hand so we'll need a layer on top. Supporting them directly in the SBOM case would be a real killer feature though!