Plan for sget! - Githubissues

dlorenc commented 2 years ago

Cc @imjasonh

dlorenc commented 2 years ago

Let's get a rough plan of the end goal and what it's going to do, then figure out how to get from here to there!

imjasonh commented 2 years ago

Somewhat stream-of-thought notes based on the community call yesterday:

Terminology note:

sget-go refers to the Go code in this sigstore/cosign, in cmd/sget -- we think we want to phase this out.
sget-rs refers to the Rust code currently in sigstore/sget -- this currently focuses on OCI fetching + policy; we think we want to remove OCI functionality from this at least.
sget-new refers to some yet-to-be-built tool that focuses on fetching URLs, verifying signatures+policy if present, checking Rekor, etc. -- focused on replacing curl | sh use cases. I think we all agree this is what we want to build.

First, come up with a plan that answers the questions:

what language should sget-new be written in? (Rust vs Go, or something else)
- why is that language the right choice, practically speaking?
what does sget-new need to do to be considered ready for release into the wild?
- not "fully functional", just MVP
- it needs to do something more than just fetch, we have curl for that today
who wants to work on this, and where? sigstore/sget exists, modulo language question above.

Finally, with that plan in place (not yet done, just agreed upon):

document sget-go's deprecation
- delete sget-go from the repo (or rename it)
- remove sget-go from any downstream packages that distribute it (Alpine, maybe others?)
IF sget-new is not in Rust, or is easier to build from the ground up in Rust:
- document sget-rs's deprecation, remove the repo or remove the code from the repo
- otherwise, sget-rs adds URL fetching, removes OCI fetching, sget-rs == sget-new, eventually becomes MVP and released

Anything missing? Anything horribly incorrect?

imjasonh commented 2 years ago

Oh! Another point:

If cosign wants to support fetching a blob from OCI, verifying signatures (+policy?) and printing it to stdout, that seems entirely within its newly staked out scope, different enough from sget not to be too controversial. 🤞

So maybe sget-go just migrates its code into cosign instead of being deleted entirely. The standalone sget-go tool could still be removed, or just become a verification-only client for fetching OCI blobs. Its name should probably change though... at that point, anything cosign wants to do to organize itself is up to cosign.

lukehinds commented 2 years ago

Just a note: If sget-go is deprecated/deleted and OCI features are removed from sget-rs (or sget-$lang), you won't have any way of retrieving blobs / signing materials from an OCI registry. This might be OK, but keep in mind the cosign policy code is present as well to work with the policy validation code that is in sget-rs https://github.com/sigstore/sget/blob/main/src/policy.rs

dlorenc commented 2 years ago

I think this is missing the "what should sget-new do" part still.

A skeleton command line surface or mock showing how a user would interact with it, and what checks it should perform. If we start with that, then we can figure out the best language and repo for the tool to live in.

imjasonh commented 2 years ago

Just a note: If sget-go is deprecated/deleted and OCI features are removed from sget-rs (or sget-$lang), you won't have any way of retrieving blobs / signing materials from an OCI registry. This might be OK, but keep in mind the cosign policy code is present as well to work with the policy validation code that is in sget-rs https://github.com/sigstore/sget/blob/main/src/policy.rs

Fetching-verifying-policying blobs in OCI could move to cosign, https://github.com/sigstore/cosign/issues/1363#issuecomment-1022324509

I think this is missing the "what should sget-new do" part still.

"Replace curl for piping into sh" seems like the consensus, at least to me:

- curl https://my.site/install.sh | sh
+ sget https://my.site/install.sh | sh

What it does behind the scenes to make that a safer alternative to curl, and how drop-in it could be in all cases, still TBD -- ideas welcome! -- but that's the goalpost I'm proposing for now.

Or: if we think we want sget-new to include OCI-fetching and npm-fetching and anything-fetching, then it seems like a better path forward in that case is to (1) write that in Go, sharing the wealth of existing code, and (2) make it a fetch-only variant of cosign, which could also gain these powers.

In that future, it's not sget <url>, it's probably cosign get-url <url> or something less catchy -- but that's what alias is for 🙃.

lukehinds commented 2 years ago

What would be the storage medium here if OCI is out of the picture?

I know you like the idea of http @imjasonh , but it's a sticky one to get working with signing materials (how to map artifact to signature , websites get hacked a lot). I honestly love the git method (I even prefer it over OCI), but unfortunately the very meagre rate limits github / gitlab have for API access means it won't work on any project that see's a decent amount of hits, so that's dead in the water now :(

dlorenc commented 2 years ago

This is feeling like we need another meeting :) What do we think about trying to set up an hour next week to get in sync around the long term vision? I think we're close but not on the exact same page.

luhring commented 2 years ago

ideas welcome!

Here's one idea for what a flow might look like for the MVP.

The user copies a command from a project README and runs it.

$ sget https://my.site/install.sh | sh

Here's what sget does

1. Retrieve the HTTP resource

sget acts as an HTTP client to download the resource (and follows a normal TLS validation process when using HTTPS, just like curl does).

2. Verify retrieved bytes

Before sending any bytes to stdout, we verify the bytes we just retrieved. To do this, we:

Calculate the digest of the bytes we downloaded.
Search Rekor by digest for matching signature records.
The bytes are considered valid if one or more signatures matched in Rekor can be verified given who the user trusts (more on this in a bit).

3. Output verified bytes

Send the entire payload to stdout, such that the user can pipe those bytes to a command like sh as needed.

Who does the user trust?

IMHO, this is the hardest and most important piece of the user experience. I don't fully understand the TUF-like system that was proposed earlier, and I don't think we should expose something like TUF to users under normal circumstances (if ever), but I do like the idea of using OIDC IDs (e.g. dan.luhring@anchore.com) as a way for the user to specify trusted identities.

I'm wondering if we need some kind of local state on the user's machine that specifies the identities that the user trusts. I'm not sure what/who these identities should be, and how they should be represented. Maybe we can prompt the user for trusting new identities as we encounter them (e.g. exit non-zero, with an error message like "You don't trust the signer; if you want to start trusting them, run command X."). If we're going for mass adoption, I think we should avoid anything too esoteric or difficult to use.

Benefits of this approach

There's very little work for project maintainers to do: sign your artifact (e.g. shell script), such that the signature is sent to Rekor. No other HTTP resources besides the artifact (e.g. shell script) need to exist on the remote server.

And there's even less for the user to do: just run sget. After the MVP, we can add more configurability to how verification works.

imjasonh commented 2 years ago

@luhring this sounds great, thanks for writing that up! I think there's still some questions to sort out, but I think that's roughly along the lines of something we'd need for this to work for arbitrary URLs, especially when the maintainers might not have done anything to enable better assurances for sget users specifically.

I'm wondering if we need some kind of local state on the user's machine that specifies the identities that the user trusts.

This seems unavoidable, basically. Do we need some separation of globally trusted identities and per-site trusted identities? I might trust anything Luke has signed, but might only trust something Dan signs in the context of Dan's site. Or is trust a globally binary state? I'm not sure.

In the absence of trusted identities, can we show a warning like "NNN users have marked this as trustworthy, here are a handful of them: a@foo.com, b@bar.com, etc." -- this would tell us that the file has already been widely fetched and marked as okay, by users willing to attach their username to it. It's still spammable by bad actors, but sampling those can help prevent that, if all the sampled identities seem sketchy or share a spammy domain.

Both of these require some flow for prompting users to note their trust of the artifact, which might also be challenging. Once you sget | sh, how can we prompt the user to tell us that they should put their username into Rekor to vouch for the artifact? sget could block, show the fetched contents, and prompt for an approval before piping to stdout, but that's a bit of a speedbump if you just want to get to the | sh part. And it won't work for all in a headless CI mode.

There's very little work for project maintainers to do: sign your artifact (e.g. shell script), such that the signature is sent to Rekor. No other HTTP resources besides the artifact (e.g. shell script) need to exist on the remote server.

I think nailing this will be crucial to adoption, especially early on while sget has not established its value to consumers or maintainers. Once folks are getting value from sget with no effort on the part of the maintainers, it's easier to convince them that it's worth adding their signatures and policies to make it even more valuable.

luhring commented 2 years ago

I've been thinking a lot about Jason's comment above. This trust thing is a fascinating problem!

Do we need some separation of globally trusted identities and per-site trusted identities? I might trust anything Luke has signed, but might only trust something Dan signs in the context of Dan's site.

I definitely see value in this distinction (it reminds me vaguely of SSH config: "here are my trust settings for any host, here are my settings for host X, etc."). I think something like this would be good to include. I'm wondering if it should be in the MVP or not.

In the absence of trusted identities, can we show a warning like "NNN users have marked this as trustworthy, here are a handful of them: a@foo.com, b@bar.com, etc." -- this would tell us that the file has already been widely fetched and marked as okay, by users willing to attach their username to it.

I love this! ❤️ It adds a social component to the world of OSS software installation. It reminds me a bit of GitHub stars. "Oh, I see Dan Lorenc trusts this thing? Okay yeah, I'm willing to trust it, too."

Tangent: It might be neat to have a web UI to showcase this, maybe sourcing data from Rekor, to make things more grokkable for the average user. I'm envisioning a way to browse signed things (e.g. scripts) (indexed by digest and/or content URL). And for each thing, you can see how many identities have signed it, and for each identity, some context about who they are (thanks to OIDC). Maybe you can also see each person's list of things they trust.

If I'm getting too web-of-trust-y, call me out! 😃

I think it would be helpful if, when sget-retrievable things are signed, the URL gets included in the signature payload (maybe as an annotation?).

This could help the user verify that the resource author intended for a given resource to be what's provided by a specific URL.

For example, without this in place, the following would be possible:

sget https://my.site/v1/install.sh -> bytes for v1's script -> signed by the correct identity -> valid (this is expected ✅ )
sget https://my.site/v2/install.sh -> bytes for v1's script -> signed by the correct identity -> valid (oh no! 😱 )

(In other words, this could help prevent "rollback" and "indefinite freeze" attacks described on the TUF site.)

If the signature payload included the intended URL, sget could further verify that the provided URL matches the URL sget was given by the user.

Also, users may want sget to have the option to explicitly skip this additional validation. Perhaps the user knows that they're fetching content from a mirror/proxy/etc., and they wouldn't expect the URL to be known at time of signing, so they're willing to accept losing this validation.

Another thought, for later on:

I've been thinking about the "requirements" concept in Apple's code signing system. This might relate to the claims/annotations feature in Cosign (although I'm still learning about this feature). This is a way for the software publisher to specify additional constraints for the verifying client to check, in addition to the raw signature being verified successfully. These "requirements" can be things like "the root cert should be Apple's root CA", or "the signing cert should be X", etc.

Down the road, I could see these making sense as optional flags to sget, especially since sget commands are probably just copy-and-pasted 99% of the time. For example, a project's README.md might show this command:

$ sget https://my.site/install.sh --required-signer="jason@hall.com,dan@lorenc.com,luke@hinds.com" | sh

This would serve to narrow the scope of what signatures are considered valid. If sget determines the signature is otherwise valid, but none of the signers listed in the command are the identity attached to the signature, validation would fail.

This wouldn't protect against attacks on the README content. But it might be a nice, simple mechanism for "trusting the right thing for this particular installation". The other advantage is that it could provide a secure approach to a stateless user environment, provided the sget client does trust whatever root CA was used for the signatures (e.g. Fulcio).

dlorenc commented 2 years ago

First, come up with a plan that answers the questions:

what language should sget-new be written in? (Rust vs Go, or something else)

why is that language the right choice, practically speaking?

what does sget-new need to do to be considered ready for release into the wild?

not "fully functional", just MVP

it needs to do something more than just fetch, we have curl for that today

who wants to work on this, and where? sigstore/sget exists, modulo language question above.

These sound like a good set of questions to answer tomorrow in the call. I'll suggest we do it in a bit different of an order:

Who is going to use sget-new, and what are they going to use it for?
What are the commands/flags/surface the tool should have to support that?
What are the different ways it should support fetching contents? Just one system/protocol, or should we support as many as possible?
How will trust/key management be handled? NOTE: This one is really hard so instead I'd suggest that we just reuse cosign's approach: fixed keys, keyless signatures with an identity, or full, custom TUF pinned back up to the sigstore TUF root. We can hand-wave this for now and figure it out later, without designing a new way to define and establish trust.

Then:

Who wants to work on this, and what is the best language/repo for it to live in?

lukehinds commented 2 years ago

I really recommend that before delving into questions such as how will the UX play out, what language, flags etc, we first define and outline what is the current and projected scope of sget-go. Otherwise you risk misunderstandings coming up later on.

dlorenc commented 2 years ago

I really recommend that before delving into questions such as how will the UX play out, what language, flags etc, we first define and outline what is the current and projected scope of sget-go. Otherwise you risk misunderstandings coming up later on.

I think that's a bit circular - sget-go is on pause until we resolve what's happening with the other sget(s). I can explain what I originally wanted to do here with it, but we're trying to make sure there's only one sget and all of these plans can change to make sure we get one thing everyone is happy with.

imjasonh commented 2 years ago

So we met, and I think we all in general agreed to the following course of action:

deprecate, archive and rename github.com/sigstore/sget to github.com/sigstore/sget-archived (name TBD)
create a new github.com/sigstore/sget repo and copy cosign's cmd/sget code into the new repo
for now, keep cosign's cmd/sget as a wrapper around the imported sigstore/sget codebase, and eventually remove it after any package managers who use that code have pointed to the "real" sget repo.
continue building sget in its own repo, in Go, focusing on URL endpoints
- this sget tool will also have functionality to fetch and verify blobs in OCI registries, inherited from sget-go
- this tool might add features to fetch specific types of things, possibly by pURL identifiers, this will be explored.

(I'm probably missing things, please add anything else!)

Folks in the call seemed pretty aligned that a more secure curl alternative was useful, and that we have some ideas we want to explore and build out, and that this is an agreeable path toward that future.

@lukehinds we wanted to get your feedback on the plan before we started clicking buttons. WDYT?

dlorenc commented 2 years ago

LGTM!

dlorenc commented 2 years ago

for now, keep cosign's cmd/sget as a wrapper around the imported sigstore/sget codebase, and eventually remove it after any package managers who use that code have pointed to the "real" sget repo.

This one actually might be too hard to do without circular imports, happy to punt on that one if it complicates stuff too much.

create a new github.com/sigstore/sget repo and copy cosign's cmd/sget code into the new repo

I think there's a git filter-tree thing that works for this so we don't lose history/authorship too.

imjasonh commented 2 years ago

This one actually might be too hard to do without circular imports, happy to punt on that one if it complicates stuff too much.

That sgtm too. 👍 If we don't keep it as a wrapper we just have to coordinate with packagers who expect to find it there.

I think there's a git filter-tree thing that works for this so we don't lose history/authorship too.

Sounds like you're volunteering 😉

dlorenc commented 2 years ago

Sounds like you're volunteering 😉

Why not! Let's give it a try.

imjasonh commented 2 years ago

From @lukehinds in Slack:

@Dan Luhring sounds ok to me. I don’t quite get how URL endpoints, will be used, but figure you plan to work that out later?

Sounds like we can start clicking buttons 👍

Thanks everyone!

haydentherapper commented 11 months ago

Closing as outdated, as sget has been archived.

sigstore / cosign

Plan for sget! #1363

Here's what sget does

1. Retrieve the HTTP resource

2. Verify retrieved bytes

3. Output verified bytes

Who does the user trust?

Benefits of this approach