Create minimal integration test and/or example working with crates.io

lkatalin commented 1 year ago

This issue will track what needs to be added for a minimal example of the proxy working with crates.io to check a crate and allow or deny it based on a sigstore parameter (ex. a license type, a valid signature) specified by a (dummy) policy. I would like to solicit input on the plans for / implementations of these parts (maybe I missed something that is already working or maybe my conception of some of the goals are completely off!).

As a subset of the larger proxy architecture, it would cover these component interactions:

IMG_20221128_172343042~2

Implementation is needed (or possibly already present) for:

A dummy policy to specify when to allow a crate to be (downloaded, interacted with, etc.) - by "dummy" I mean as simple as possible and potentially hardcoded temporarily. I assume the "policy" that takes on this role is the kind defined in opa-client and that this differs from the "policy" included in seedwing.toml - is this right? I am trying to figure out what this would look like. Are the license files (here, here) from opa-client the best example? How are these created? Also, maybe the examples I am pointing to already work just fine with some flow that I haven't discovered yet, in which case no need to make a separate dummy policy.
Interaction with sigstore. I see there's currently a submodule for this, which is blank at the moment. However, it looks like sigstore (rekor) interaction is already happening via repositories/crates/api/v1/mod.rs - what differentiates the sigstore submodule from what's going on in the api? What is the purpose of getting this Rekor entry as part of the download()?
A request from the "build pipeline" to kick off this whole flow - how is this meant to take place via something like cargo that doesn't know about the proxy? I had imagined the proxy would be a wrapper for something like cargo, but since the proxy is a server with various endpoints, it looks like there is a different plan.

Thoughts on the above are greatly appreciated. @bobmcwhirter

danbev commented 1 year ago

I am trying to figure out what this would look like. Are the license files (here, here) from opa-client the best example? How are these created?

The wasm module was created by doctrine, and there is a test target if the policies need to be modified. We can copy/move the Makefile into this repository if needed.

bobmcwhirter commented 1 year ago

So yah, the existing code can be ignored as appropriate. The current sigstore integration was me just verifying that I was able to sign a crate outside of this workflow, and then SHA it and find the sig(s) upon fetch.

I do think the sigstore sig-fetching should probably be on the policy-engine side of things.

With regards to integration with cargo, I think there's two routes:

1) As a transparent-ish HTTP proxy

[http]
debug = false               # HTTP debugging
proxy = "host:port"         # HTTP proxy in libcurl format

2) By setting the default (or other) repos to the proxy URL

[registry]
default = "…"        # name of the default registry
token = "…"          # authentication token for crates.io

both via https://doc.rust-lang.org/cargo/reference/config.html and probably set via a .cargo/config.toml either under the CI's $HOME or within the project dir itself.

The method I was aiming for would mean not wrapping or PRing changes to upstream cargo.

Rather, it's more "content shaping" and providing a network-based filter in front of cargo's normal HTTP operations.

How we deal with git crates? No idea!

lkatalin commented 1 year ago

Thanks for the replies, @bobmcwhirter @danbev . As is usually the case, your answers have germinated more questions.

The current sigstore integration was me just verifying that I was able to sign a crate outside of this workflow, and then SHA it and find the sig(s) upon fetch.

Sounds like a lot of the functionality we need is already there, then, just needing to be moved to the policy engine and then the results used to allow or deny, perhaps under evaluate()?

With regards to integration with cargo, I think there's two routes ...

Thanks for outlining the two possible routes for cargo integration. I need to study the cargo config a bit to familiarize, and then form more questions. Is the in-toto work following one of the two models described? Or does it modify cargo (I see some cargo r commands)? Or is this work completely separate from the seedwing proxy and so it doesn't matter?

How we deal with git crates? No idea!

I see git dependencies mentioned here, is this speaking to that problem or am I conflating two things?

I also have some higher-level questions about the roles of the different repos in play and OPA, and around the policy format itself.

I haven't yet worked with OPA. Are we offloading evaluation of whether a particular crate matches a specified policy for the seedwing-proxy to OPA, or is this meant to be done in the seedwing code (ex. with policyengine.evaluate())? What is the role of OPA? Is there some notion of a non-OPA policy for the seedwing proxy? Why are the policies wasm modules? I think this is the biggest chunk where I'm missing something, so I'm going to be reviewing OPA docs, but any context is helpful.
How does the source-distributed in-toto work relate to the seedwing-proxy? How does cargo-verify.rs relate to a policy evaluation? Are these doing any overlapping or complementary things?
How does the seedwing-policy repo relate to the opa-client (which also deals with policies)? As @danbev mentioned, doctrine actually creates the policies, which are then ... sent to the code in one of these other repos? Something else?

bobmcwhirter commented 1 year ago

wrt git dependencies, I'm wondering how we can intercept and apply policy. I'm unfamiliar with how cargo actually fetches dependencies from a git URL. Is that something we can proxy? Dunno!

seedwing-policy is my nascent attempt to do something a bit better than OPA. I find Rego and OPA a bit.. tedious and overgeneralized. It may go nowhere, but if it works, I'd like -proxy to be able to at least alternatively use -policy.

The doctrine repo was intended, kinda sorta, to hold authored policies. The idea being that folks could meld together some centrally-authored policies, along with their local organization exceptions or adaptations.

No reason more than a single person needs to write a policy defining "OSI-approved licenses" etc. A policy library.

bobmcwhirter commented 1 year ago

I also wonder if the [patch] section of cargo config can be used, at least for git dependencies.

But are we creating too much busywork for users to integrate? If we can aim for minimally-invasive to a build, that'd be best, I suspect.

lkatalin commented 1 year ago

Okay, thanks - so seedwing-policy is a potential replacement for opa-client. doctrine is a policy library. With policies are we trying to focus on licenses only atm, or is it equally viable to have a policy stating something like "crate signature must be present in rekor"? Is there a minimal policy in doctrine saying something like this already? Admittedly I have trouble understanding rego atm as I have not seen it before, but it seems mostly license-focused from what I have looked through so far.

lkatalin commented 1 year ago

If we can aim for minimally-invasive to a build, that'd be best, I suspect.

:+1:

bobmcwhirter commented 1 year ago

Yah -policy would be a replacement for OPA. And then we'd need a seedwing-policy-client.

And yes, the policy library could contain a policy that says "signatures from foo@bar.com must exist in rekor"

The input to the policy engine would be (incomplete)

crate name
crate version
sha256 of the crate
origin URL (crates.io, other)

Policy engine then scrambles that up, queries sigstore, queries $whereever-other-data-is-kept, and can decide if the requested artifact is allowed or rejected.

bobmcwhirter commented 1 year ago

And really doctrine was me just learning how to write policies with Rego/OPA, and deciding that license compatibility might be the easiest thing to reach to start. I have no idea how to teach OPA to query rekor. Maybe we can. Maybe -policy just knows how.

danbev commented 1 year ago

Is the in-toto work following one of the two models described? Or does it modify cargo (I see some cargo r commands)? Or is this work completely separate from the seedwing proxy and so it doesn't matter

This is completely separate and was pursued to figure out how a source distributed Rust project could potentially be signed with in-toto and sigstore, and then how it might be verified using a command line tool or cargo extension. This was done because I was talking/thinking about how things might work but it was more like guessing. Having something that actually works helped iron out things and allowed us to fix some issues around this.

How does cargo-verify.rs relate to a policy evaluation? Are these doing any overlapping or complementary things?

They are currently unrelated, though perhaps if cargo-verify (or some other name) does become something it could utilize the policy work to verify more than just signatures/layouts.

Why are the policies wasm modules?

Sorry about that, I did have a motivation note in a different repo but never copied it over. I've added a note about this now. Another motivation was at the time I was not able to find a Rust implementation of OPA and using kubewarden/policy-evaluator made sense to save time until we know if/what will be used eventually.

danbev commented 1 year ago

Would it make sense to stick this section "Motivation for using tools/projects" in one of the repositories?

lkatalin commented 1 year ago

Would it make sense to stick this section "Motivation for using tools/projects" in one of the repositories?

Yes, I think this would be great. We should have such a write-up for all of the repos.

lkatalin commented 1 year ago

So the two routes to a MVE (minimum viable example) seem to be:

License types <-- probably easier?

[ ] Identify an existing minimal policy from doctrine
[ ] Have PolicyEngine``evaluate() based on this policy (?)
[ ] Integrate with cargo based on one of the two routes above using cargo config

Rekor signatures <-- probably harder but there is code we can use in `download()`

[ ] Figure out how to get an OPA policy to specify this, OR write an entirely new kind of policy
[ ] Have PolicyEngine``evaluate() based on this policy (?)
[ ] Integrate with cargo based on one of the two routes above using cargo config

Does this sound right-ish?

Update: we are planning on using the new seedwing-policy to create these examples around rekor signatures, and using [source] in the cargo config to send traffic to the proxy.

lkatalin commented 1 year ago

Blocking issues:

[ ] #5
[ ] #6 (edit: closed as unnecessary)
[ ] #7
[ ] #9

seedwing-io / seedwing-proxy