slsa-framework / slsa

Supply-chain Levels for Software Artifacts
https://slsa.dev
Other
1.56k stars 225 forks source link

Make Version Control required for SLSA Level1 #127

Closed tograla closed 3 years ago

tograla commented 3 years ago

I vote for making version control system a requirement for SLSA Level1.

In the Source requirements section, none of the items is needed for Level1, whereas we could pick and choose from:

Of course I'm not proposing all these should go to L1, but instead could be spread between L1-L4 in a meaningful way.

dlorenc commented 3 years ago

+1 on requiring source control for L1.

trishankatdatadog commented 3 years ago

+1

joshuagl commented 3 years ago

Agreed. While I recognise that there are projects which don't use revision control, they really should and it seems like a reasonable requirement for L1.

bobcatfish commented 3 years ago

A couple quick thoughts:

TomHennen commented 3 years ago

I'd like to understand the reason folks would like to require source control at L1. @bobcatfish has provided some thoughts, does anyone else care to expand? [edit] I'd be especially interested in knowing what specific source control properties people are hoping for? As @tograla mentioned there are a number of properties that could be spread out among the levels.

I think we may want to consider products like AWS Lamba and Cloud Run which both let you deploy directly from source on disk. These products could fairly easily meet SLSA L1 and have authenticated provenance even if they didn't fetch the source direct from source control themselves (the provenance could include an aws bucket path & hash). Is the suggestion that they not qualify for SLSA L1 or that they could qualify and instead it's just important that the user assert that the source they deployed came from source control1 (even if there's no way to verify that). If we think this deployment method shouldn't qualify at L1, what is the benefit vs the reduced visibility into how the artifacts were built?

Personally I'm rather wary of adding more requirements at L1 because it can make adoption harder and thus make tracking what improvements need to be made harder. One of the use cases I have for SLSA is to be able to look at all the artifacts that are in use, understand their supply chain security situation, and to use that information to figure out where investments need to be made. The easier it is to adopt SLSA L1, the more people will adopt it (hopefully), which via the provenance, can tell anyone who cares where the build came from (modulo the security properties of an un-authenticated provenance) so they can then see what extra work needs to be done to increase the SLSA level. So SLSA L1 is less about making any security statements, but rather about being the starting point for improving the security of your supply chain.

Contrast having L1 provenance (which lists builder, some materials, some build parameters, etc...) with having no provenance for an artifact. It seems like it would be harder to track down where improvements need to be made.

1. Their CI/CD system could fetch the source from source control, then execute the deployment command. The builder won't know this happened.

TomHennen commented 3 years ago

Oh, I should also add that we do have some people that are working on meeting SLSA L1 and L2 (with the goal of getting to L3/L4 in the long term). In general adding requirements to lower levels can cause trouble for implementors who are trying to plan work. So I think it's also worth weighing the benefit of adding things now, with the cost to implementors, especially if it winds up burning good-will with the SLSA project. Maybe that's not a good way for me to think about it?

dlorenc commented 3 years ago

I don't think it's onerous at all to require people to use source control for their code. I think you're getting at something different though, which is how the build system accesses the source code (directly from the SCM repo, vs. as a static export).

That's a separate concern IMO, and should live in the provenance or build system requirements. I don't think this requirement should have any implication on any build systems. If it's meant to be interpreted that way, then it's very unclear.

MarkLodato commented 3 years ago

There are two underlying issues here:

  1. 111 to @bobcatfish's third point.

  2. 129 to @dlorenc's point.

@TomHennen and I were using "source" to mean the build configuration, which in many cases does not live in source control. Azure DevOps Pipelines and Google Cloud Build are two examples where the common case is to configure via GUI rather than config-as-code (#115). Therefore, adding a requirement that the build configuration is version controlled would either (a) prevent these systems from reaching SLSA 1, which is undesirable; (b) force us to redefine version control to include these GUI-based systems, which may be OK; or (c) force these systems to reimplement themselves on top of version control, which indeed is onerous.

If we're talking about the "top-level source code", then the issue is the ability to automatically verify the requirement. An unstated SLSA principle is that all of these requirements are automatically verifiable; they're not just guidelines. Thus, if we add a requirement, we need a technical means to verify it.

So I'm cautious about adding the requirement to SLSA 1 until we figure out how we'll handle these cases. What do you all think?

TomHennen commented 3 years ago

I'm sorry this is so long...

Clarifications

As @MarkLodato pointed out that it might be worth clarifying what is actually meant by 'source'. (more discussion in #129)

There are at least [1] "build configuration", [2] "primary source code input", and [3] "all other dependencies. I'm not sure all systems have a way to differentiate between 2 and 3.

Currently the SLSA provenance only requires 1 at L1 ("The provenance identifies the source containing the top-level build script, via an immutable reference.") and #115 proposes removing that requirement at lower levels and instead just documenting the build command. There are currently no requirements to record 2 & 3 until level 4 ("The provenance includes all transitive dependencies listed in Dependencies Complete").

I suspect most folks in this thread are talking 2 "primary source code input" here, is that right? If nothing else it seems this is something that could be clarified in the source requirements section...

Options

1. 'Weak' source control requirement at L1

For this option we make source control required (maybe with fewer required features than L2+), but we explicitly do not require the builder to know about or attest to the source control system used. Projects would simply state "we use source control" and be able to meet this requirement. The provenance would not always be able to contain a reference to the source control system used.

This option would make it clear that we (SLSA) think source control is a good thing. It would not, however, allow users to trace a binary back to its source code (which I believe is something @bobcatfish was suggesting would be helpful in this comment).

2. 'Strong' source control requirement at L1

As in 1 we make source control required at L1, but we explicitly do require the builder to attest to the source control system used.

This option would make it clear that we think source control is a good thing and allow all SLSA artifacts at L1 to be traced back to source code with some degree of confidence. It would, however, push popular workflows (e.g. Lambda and Cloud Run 'from source' flows) out of SLSA and into SLSA L0 (which we'd meant to mean 'things with no provenance', though I can't seem to find this documented). Debian may even have this problem due to the fact that things are (IIUC) built from source packages rather than directly from a source repo.

3. No source control requirement at L1.

This is the current state. Users can get to L1 even if they don't use source control or if the builder doesn't fetch the source from the source control system. The builder may still identify the source used by including the path & hash of the artifact in the provenance under materials.

This option doesn't send as strong a message about source control, nor does it allow someone to trace all L1 artifacts back to the source repo used. It does allow popular workflows to be L1 and still provides some traceability in the provenance.

Open questions

Other stuff

I think there's a separate question of governance (am I using this word right?) regarding how to determine when it's OK to add requirements to previously defined levels. There can certainly be advantages to adding reqs at lower levels, but it has costs as well. Teams that are working on building features, documentation, etc... around SLSA would need to adapt their plans. One of the advantages of 'levels' is that it gives people a shortcut to talk about the requirements, if these requirements are changed often (which isn't necessarily what we're doing here!) then the labels become much less valuable since their meaning can change. Note that we do say "Reminder: SLSA is in alpha. The definitions below are not yet finalized and subject to change, particularly SLSA 3-4." so there is still wiggle room to change things now.

Perhaps this should be discussed in another issue or at the bi-weekly SLSA meeting (this Wednesday!)?

dlorenc commented 3 years ago

An unstated SLSA principle is that all of these requirements are automatically verifiable; they're not just guidelines.

I think this is worth stating somewhere up front, because many of these other guidelines are not automatically verifiable either. Retention history, superuser access, etc. all come to mind as much harder to verify.

TomHennen commented 3 years ago

I think this is worth stating somewhere up front, because many of these other guidelines are not automatically verifiable either. Retention history, superuser access, etc. all come to mind as much harder to verify.

Yes, agreed. What would be the best way to do that? I could send a PR documenting how that could work as well as documenting open questions that need to be resolved...

MarkLodato commented 3 years ago

I filed #130 about the automatic verification principle. Let's follow up there.

tograla commented 3 years ago

Thanks everyone for all your thoughts! This discussion has been quite informative and I do agree more clarification about the term 'source' is needed. My intention behind proposing SCM for L1 was to drive good software development practices (and to enable tracability of changes that SCM systems provide) and pave the way for automated build process.

Thinking about supply-chain security in a broader perspective with the Lambda/CloudRun services example in mind, I ask myself question about integrity of code in such a design. What guarantee do end-users have that the code uploaded to Lambda is actually the same code that sits in the coresponding repository declared as 'the source', if the code was first downloaded locally?

Definetely one of the desired properties of an end-to-end pipeline is to reduce (even eliminate) stages in the process where malicous or accidental tampering could occur, hence the process should be automated (chain-of-custody analogy). With that said, I wouldn't mind keeping pipeline designs that lack this capabiity at rather low SLSA Levels ('zero' or allow 1 (one) at best).

xiaowen commented 3 years ago

RE whether it's onerous to require people to use source control: it depends on what your current process is.

I've heard of a lot of Google Cloud Functions (GCF) users that write some "glue" code in the GCF web UI and deploy by pressing the "deploy" button. Will that meet SLSA 1? What exactly is the definition of "source control" and does that GCF flow meet that? The GCF web UI doesn't support having a "change description/justification", but it does save each version of the source code. If we must force users to completely change how they work today to meet SLSA 1, then it could be onerous for those users.

This all comes down to what we want to achieve with each SLSA level. I could see a good story here by saying that GCF can help beginner users enter SLSA by having this basic flow meet SLSA 1, and they get some benefits right away like basic provenance info. Then as users get more advanced, they can switch to a different flow, use a compliant source control system, and get SLSA 2+.

06kellyjac commented 3 years ago

There's no reason someone couldn't get started following SLSA 1 or SLSA 2 and in their scenario accept they're not using VCS in order to get started on their supply chain security journey. And it's not too difficult to either develop in VCS then copy to the GCF UI or work on it in the GCF UI and copy into VCS.

Either way, as mentioned above SLSA is planned to be automatically verifiable and I can't see that being easy with just storing your code in the GCF UI. At the very least you'd have to put in extra work to pull the source out of GCF using the API yourself, at that point it'd be easier to just use modern IaC practices.

If we must force users to completely change how they work today to meet SLSA 1, then it could be onerous for those users.

Supply chain security is a difficult problem, I don't think lvl 1 should be so easy that everyone gets it for free. As mentioned you can still have SLSA goals if you're going to skip a criterion here and there for the time being, the guidance is still solid.

I've heard of a lot of Google Cloud Functions (GCF) users that write some "glue" code in the GCF web UI and deploy by pressing the "deploy" button.

You could also argue that using the UI doesnt quite meet Build - Scripted build and I'm not sure how you'd get provenance info for Provenance - Available, which is both of the currently required criteria. At this point I'd say having X users that are able to put code in the UI for cloud functions doesn't constitute much of a supply chain.

You need code -> ... -> production but it looks more like code -> production or even code -> GCF Black box -> production here. There are interesting questions as to how GCF itself could provide providence and other aspects of SLSA to give confidence that the code you submit makes it to production (the "GCF Black box" in the flow mentioned previously)


I'm not too familiar with GCF and the little work I've done with it has used terraform so LMK if I missed the mark anywhere here. Maybe if you have any parallels with AWS Lambda that'd help as I have more experience with that ecosystem.

TomHennen commented 3 years ago

Either way, as mentioned above SLSA is planned to be automatically verifiable and I can't see that being easy with just storing your code in the GCF UI

When I think of the 'automatically verifiable' goal I'm thinking of how we can verify the requirements at each level. Clarifying that is the goal of #130.

If we mean verifying the requirements of the levels, then including a source requirement at Level 1 would mean that we'd need to figure out some way to verify the source code the builder built from was stored in a VCS. Under that interpretation how could a process verify that a given artifact was built from source stored in a VCS if the builder didn't fetch the code from VCS itself? There's also the question of what is meant by 'source' in this suggestion. Do we mean the 'build entrypoint' was stored in VCS? That could be pretty easily determined (but wouldn't allow some builders like Tekton to onboard today since they don't support config as code, see #115 for more).

If we don't mean build entrypoint but rather 'the primary source', lots of builds have binary dependencies that aren't stored in VCS but would eventually be covered by transitive SLSA (which is a problem that's been deferred for now). When inspecting provenance how could a verifier know that some blob downloaded from an HTTP endpoint in materials isn't source code and thus didn't need to be fetched via git/hg/...?

129 hopes to get clarity on what we mean by 'source' (the bottom line is we probably need to be more specific).

You could also argue that using the UI doesnt quite meet Build - Scripted build

This would be an interesting discussion to have, would you care to start a new issue to talk about it?

I'm not sure how you'd get provenance info for Provenance - Available

The Provenance Available requirement is meant to indicate that whatever provenance is generated is accessible to the consumer of the artifact.

There are interesting questions as to how GCF itself could provide providence

My thought is that GCF (or whatever it's using under the hood) would be classified as a builder and need to meet the Build Requirements.

MarkLodato commented 3 years ago

I'll be on vacation for the next four weeks and wanted to jot down some thoughts before leaving.

Before making this decision, we really need to resolve #111 and clarify the benefit of requiring version control at this level, more than just "it's a good idea." For example, the reason might be that identifying the source revision allows one to join with static analysis on the source or to perform age-based checks (e.g. no sources older than 1 month). We just need to spell this out and compare this to the increased cost of adoption.

Assuming we agree that it is desirable, one option is to require version control for "primary source" but not "build configs" at SLSA 1 & 2 (#129). This might be a decent middle ground:

The question then becomes how to verify the source requirement. Here's one idea:

(*) I'm using "attestation" here liberally, since it wouldn't be signed at L1.

Another option that would work at L1 but not L2 would be to have the client just send the VCS metadata to the builder, who blindly records it in the provenance. For example, in the GCB case, the client could include the git commit in the tarball, and GCB outputs that instead of the hash of the tarball.

trishankatdatadog commented 3 years ago

I guess the question is: how many software projects don't have source control, but use CI/CD?

TomHennen commented 3 years ago

I think the question is "how many projects have CI/CD that isn't aware of source control being used". The AWS Lambda and Cloud Run cases mentioned in this comment are examples where the builder doesn't necessarily know if source control is used, so it can't be included in the provenance.

trishankatdatadog commented 3 years ago

Right, not to mention that any CI/CD system could be used to pull source from anywhere outside of RCS.

mlieberman85 commented 3 years ago

I might be misunderstanding and maybe just semantics, but does the builder itself need to be aware where the code came from? the scenario described in the linked issue above is a fairly common one, generalized as something like:

  1. CI fetches code from source code control (or wherever) and puts on storage A
  2. CI triggers builder to compile, packages, build code from storage A and puts artifact(s) on storage B
  3. CI publishes artifact(s) from storage B to artifact repo.

So in this case I think the provenance is still traceable because you still know where you pulled the source from, with some caveats. If it's a push model where CI is supposed to just pick up whatever is pushed to some storage bucket or something, I would say provenance starts there, and I think it also highlights that's probably an anti-pattern.

TomHennen commented 3 years ago

I think it depends on if the CI is happening in one integrated location and has visibility into past steps.

In the AWS Lambda/Cloud Run scenario the builder can be completely disconnected from the steps that fetch the source. The entrypoint is literally "here's a source tarball, please build/run it". This isn't necessarily bad. A very nice property that they have is that the build is taking place on a centrally managed service and not on some developers laptop. What's unfortunately is that they can't say what source repo that code came from.

I'm also not sure if that flow would match the Debian workflow where binaries packages are built from source packages, which at some point in the past were pulled from a source repo.

At higher levels (especially once we have a resolution to this issue [I have something in mind, just haven't had time to send a PR]) we'll actually be able to join a build provenance (which lists commit hash) which a source attestation, which will let us say what source requirements the source used met. This + chained verification (something else that needs to get worked out) would let us solve both of these use cases if necessary (provenance for the tarball/debian package could provided that traces the tarball back to the source repo).

Given that there are a number of ~common identified use cases where the builder doesn't necessarily fetch the source itself, and these other issues are unresolved (and require a lot more work, so probably higher SLSA level), my gut tells me that a source requirement at L1 would leave too many people unable to start adopting SLSA without making major changes to their infrastructure. That seems like it would be demotivating. So why not leave L1 as is, not require source code, and just make it clear that L1 is a starting point. Another, fairly easy, option, is explicitly add the recommended ○ for source control at L1.

What do folks think?

(Tagging the committee to make sure we have a breadth of opinions: @brunodom @david-a-wheeler @joshuagl @marklodato @mlieberman85 @trishankatdatadog @zakgreant)

trishankatdatadog commented 3 years ago

Now that I think about it, where source code came from is not as important at all as who signed the source code. With the Datadog Agent integrations, our developers sign attestations about the hashes of source code, so where source code came from is totally irrelevant, so as long as what developers signed matches what went into the builder, which I think is what Tom means by chained verification.

If this is too strong a requirement for L1, which I can very well believe is the case, then requiring the builder to simply record the Merkle tree root of the source code or the hash of the source tarball or some such should be a good enough start. In any case, a source control management system is not required, and can be reserved for L2.

Does this help to clarify the matter?

inferno-chromium commented 3 years ago

As per @trishankatdatadog - "In any case, a source control management system is not required, and can be reserved for L2.", looks like we are sticking with keeping things as-is (or add hash/merkle tree root of source code?) and keeping SCS requirement at L2.

Anyone else has thoughts to change this. It will be good to have concensus on this before v0.1 is cut this week.

tograla commented 3 years ago

Indeed, it seems so. Even though we seem to have gravitated to where we were in the first place, the discussion and take-aways have prompted other issues and useful wording clarification.

joshuagl commented 3 years ago

Fascinating discussion. Keeping things as-is, with perhaps @TomHennen's suggestion to "explicitly add the recommended ○ for source control at L1", seems like the appropriate path forward.