Fix recipe and materials

MarkLodato commented 3 years ago

The recipe field is supposed to indicate how to build the project overall, not just what generated the provenance.

For config-as-code, I would expect the following:

recipe.type is some URI representing "Azure DevOps Pipeline YAML"
materials[recipe.definedInMaterial] is a git repo containing the YAML config
recipe.entryPoint is the path to the YAML file

Example

"recipe": {
  "type": "https://dev.azure.com/Attestations/YamlRecipe@v1",
  "definedInMaterial": 0,
  "entryPoint": "azure-pipelines.yml"
},
"materials": [
  {
    "uri": "git+https://github.com/HariSekhon/DevOps-Bash-tools",
    "digest": {
      "sha1": "5b250c0a12ae03da737d31d7a85a637db8509f96"
    }
  }
]

For non-config-as-code, we'll need to figure out how to represent the uri and digest of the configuration. Note that we have the exact same problem for Google Cloud Build. @msuozzo @loosebazooka FYI.

gattjoe commented 3 years ago

Thanks for the clarification, will get this done tonight.

gattjoe commented 3 years ago

Hmm, gonna have to think on this a bit. You are right, they have the concept of "classic" pipelines in ADO (GUI based) and that means no azure-pipelines.yml...

gattjoe commented 3 years ago

Hi @MarkLodato,

This is going to be very difficult in Azure DevOps (ADO) since there are two types of Azure Pipelines, the "classic" GUI based, and "yaml" config-as-code based. Unfortunately, the agent executing the pipeline doesn't know which type of pipeline is being used to build the project. As a result, we may not be able to reliably determine recipe.type, materials, and recipe.entryPoint. What's worse, when using "yaml" pipelines, ADO stores the "azure-pipelines.yml" file in the source code repo; however, there is nothing that says a given build job has to USE that file. You can have an "azure-pipelines.yml" file in your repo but still use a "classic" pipeline to build your project.

So enough complaining, possible solutions include:

recipe.type - We could use the value that I used in buildInvocationId (take a look at the example). The buildInvocationId is made up of three values:

teamFoundationCollectionUri -> https://dev.azure.com/gattjoe
teamProject -> OCSPChecker
buildId -> immutable, incremented build number for the project

materials - will always be 0 for ADO since we can't read the build steps

recipe.entryPoint - is currently set to Build.DefinitionName, which is the name of the build pipeline that initiated the build. This is probably the best ADO can currently do (as far as I know).

As far as materials, I gave the URI for the repo (which doesn't have to be in ADO) and I gave the commit hash that is being built.

EDIT:

Looks like the python-api CAN return the build steps with a little magic. See this for the detail.

So in the above examples, the build definition is: https://dev.azure.com/gattjoe/dc3e0717-50f4-460e-86b6-a32b352d19d4/_apis/build/Definitions/14?revision=4, where:

gattjoe is my private ADO organization
the guid is the guid of the project
_apis/build/Definitions/14 is in actuality the build.DefinitionName.
revision=4 means I updated my definition four times.

For reference, the ADO API is documented here

I've even been able to extract my build steps out of ADO. The problem is, since CI tools are infinitely configurable, it would be impossible to pull out all of the actual build steps and figure out the relevant details. I've attached an example of the output, which is an unformatted dump of my four build steps for the project that I used the extension on. The steps are basically:

Use Python 3.9
Install build tools
Build
Run provenance generator

Let me know your thoughts.

MarkLodato commented 2 years ago

Hi @gattjoe, sorry for the long delay. I was on leave.

We have the same issue with Google Cloud Build (GCB), where we can't tell whether the build steps come from YAML or from the GUI. (@msuozzo @TomHennen FYI.) For now, I think it's OK to say that we can't detect it, then add it later if we figure out a way (say by the API). I'll take a look. Thanks for the research!

I'll do a little more research then send you a pull request for the fields!

MarkLodato commented 2 years ago

There is a similar issue with self-hosted runners, which applies to both Azure Pipelines and GitHub Actions. Each individual job within a workflow can use its own runner, so there is no workflow-wide property that says "all steps were run on hosted runners."

slsa-framework / azure-devops-demo

Fix recipe and materials #5