in-toto / attestation

in-toto Attestation Framework
Other
239 stars 66 forks source link

A Source Attestation for recording metadata about source code #124

Open trishankatdatadog opened 1 year ago

trishankatdatadog commented 1 year ago

Presently, there is no proposed Attestation for recording metadata about source code that developers may have written. For example, the Datadog Agent Integrations use in-toto links to record what source code was checked in by whom.

An example Source Attestation may look like:

{
  "signatures": [
    {
      "keyid": "5e2e6791ad1dca571dd782006dc8f36eeba9870c",
      "other_headers": "04000108001d1621045e2e6791ad1dca571dd782006dc8f36eeba9870c050263c5145b",
      "signature": "0d5df55aff285784ca2295942c8735543b6f78c3d64ac5bfcd4297143b5ca917f648af1e0358fa0fce83c908963849b34122d697bc92c5b66beb0d1420190be35c1c0c760184f2aad0d92aaae1bbad4ce7d937e143dfae24f511cd7d28db55dcdd0140816fe33779c57b560a626ad99c81cd565a70cd122e5052b824e289ff3e85450f7e023c1c7d3a8b01ffdf29ede411245cf5534de5623b999a685076b5103ee42a0a53a4f01a55b1f695436a106e6a62d6a3a2fbab28b53cad53e7b0cd363dc82789f600ce425c9b6238e9338134ad5d39c6a6155a972be426f5e46753cfe87b725876e46d729ff03b83f5ddf66d95aafe81b169bbe5ad6bdd554de741d01dc94ae41206705c9eb86ee30dbe0a9ec810d38c37954dfedcec327f85995e671e1875eca4e884f25fd588886ea855688b9d840dc5e540e660faac82d8086eb7906cd26f9deffee07866a178ecaf806e9dbd96c8e9cc6becef5ed1d1f91588ed635bfba45bfab8bd100bc536a37298ef0481ec3792331105f21c36121d1cad44cc399eafdac6a4cfc599889cd11538cf55aaa0dc311f7fad6c0e413a3583f4a72c130522cf95bf21ab23a83620492607d838b5f72f87ba7de0acd6787e7c56db8b4d4dcab06334a04bfdd98b19dc6fb473b7a71e3c5f72102dd50c8e0c3f47ee878c52e0aebe5e79aa1ee4d5bc9a8c7db868cbddff03266450d72cb68c63d9d7"
    }
  ],
  "signed": {
    "byproducts": {},
    "command": [],
    "environment": {},
    "materials": {},
    "name": "tag",
    "products": {
      "sonarqube/datadog_checks/__init__.py": {
        "sha256": "6371d7972a9ae8c17a1bd016bf00f3381221aa69a20d185d1894d72d090ea129"
      },
      "sonarqube/datadog_checks/sonarqube/__about__.py": {
        "sha256": "3da0ece8605e3ebd76be8d0cd269ed137365b2a8a76ede3d36796699bc26779e"
      },
      "sonarqube/datadog_checks/sonarqube/__init__.py": {
        "sha256": "2f7a791652d7ebb1f374fa29227d15985f072938b5741c5c495e41ede05c6466"
      },
      "sonarqube/datadog_checks/sonarqube/check.py": {
        "sha256": "d01e3c878701a70cbf980bf9561fb20003a962a341e4bcdf73294778f6e95e88"
      },
      "sonarqube/datadog_checks/sonarqube/config_models/__init__.py": {
        "sha256": "c9cf5c66894430e7edbb00d00613b58ccfd38360f2fe490a23c17cf71ed294dc"
      },
      "sonarqube/datadog_checks/sonarqube/config_models/defaults.py": {
        "sha256": "b0e4f6e0271c6540f95d766d87f68261c2fc6509d0126258b3af0d50001f12dc"
      },
      "sonarqube/datadog_checks/sonarqube/config_models/instance.py": {
        "sha256": "31087fb01eed552a9e638d1ea2c3b0193f522887ddc87aa5c9720685714fbb7d"
      },
      "sonarqube/datadog_checks/sonarqube/config_models/shared.py": {
        "sha256": "f8584c4ea94f08df06ce33887a8ea19e069b1f061d85c6660dbd794b53a239f6"
      },
      "sonarqube/datadog_checks/sonarqube/config_models/validators.py": {
        "sha256": "0424fe17778b76e1b589b9564d0d543d1b71dba1edd6e5d71a7c528dddf68e0b"
      },
      "sonarqube/datadog_checks/sonarqube/constants.py": {
        "sha256": "2943383cf4009921fc03645ab6b7762af6e63a15eb38c665b820c266c09cad01"
      },
      "sonarqube/datadog_checks/sonarqube/data/conf.yaml.example": {
        "sha256": "bf3a1adb6934b53300b998b1b0e58223f2733c476009626b0ed99488c50de12d"
      },
      "sonarqube/datadog_checks/sonarqube/data/metrics.yaml": {
        "sha256": "73ca0ace01a7b6a577ec372cdbcccceff4ad23946bac8511c0386c31125af3bc"
      }
    }
  }
}

Developers may wish to record direct metadata about source code files (as in the example above) to allow for immediate verification of source code by clients, or indirect metadata about them (e.g., git tree hashes).

Cc @developer-guy @wlynch @adityasaky

wlynch commented 1 year ago

We've been experimenting with a source attestation type over at sigstore/gitsign that is a bit closer to the git commit format - https://github.com/sigstore/gitsign/blob/main/internal/commands/show/testdata/fulcio-cert.out.json

You can try this out with gitsign show (it also works with non-gitsign signatures). It doesn't do the per-file attestations like @trishankatdatadog's attestation above, but I don't see a reason why both attestation types can't exist. 😃

We've also been experimenting with storing attestations to git repo data inside the repo itself with gitsign attest (right now commits and trees, but we could add support for any git type). - @adityasaky has some other interesting ideas in this area to combine git + tuf. 🎉

adityasaky commented 1 year ago

I believe the two are closely linked and there are use cases for both. I do want to note that a link predicate type has been proposed from the start: https://github.com/in-toto/attestation/blob/main/spec/predicates/link.md. IMO it can go both ways, we can use gitsign's commit attestation and verify a CI system built off the right commit (i.e., using git semantics all the way) or we can use links to record the sources directly.

I have been meaning to have the link predicate get the ITE-9 treatment so I think it's a good idea to leave this thread open and discuss ideas. cc @in-toto/attestation-maintainers

trishankatdatadog commented 1 year ago

I think it's helpful to define an explicit Source Attestation, just like you want an explicit, say, Review Attestation.

marcelamelara commented 1 year ago

I think it's helpful to define an explicit Source Attestation, just like you want an explicit, say, Review Attestation.

Looks like what you're proposing is a per-file vcs commit attestation ("source code attestation" probably means different things for different people)? Trying to understand what the requirements are so we can develop a meaningful predicate. What is unique to these attestations that can't be expressed/covered by a Link? What types of trust or security questions should such a predicate be able to answer?

trishankatdatadog commented 1 year ago

What is unique to these attestations that can't be expressed/covered by a Link? What types of trust or security questions should such a predicate be able to answer?

IDK yet, but these are some of the questions I'm trying to find answers for, so thanks for raising them 🙂

trishankatdatadog commented 1 year ago

A use case would be to record extra metadata such as what was the original codebase which this one forked from, and exactly what changes have been made

Cc @MarkLodato

colek42 commented 1 year ago

We have implimented a git attestor in Witness. ref: https://github.com/testifysec/go-witness/blob/main/attestation/git/git.go

TomHennen commented 1 year ago

@colek42 do you have any docs that describe how that attestation works?

colek42 commented 1 year ago

We do not yet. But I threw together a spec based on that struct and some WIP we have

{
  "_type": "https://in-toto.io/Statement/v0.1",
  "subject": [
    {
      "name": "commithash:<COMMIT_HASH>",
      "digest": { "<ALGORITHM>": "<HEX_VALUE>", ... }
    },
    ...
  ],

  "predicateType": "https://witness.dev/attestations/git/v0.1",
  "predicate": {
    "commithash": "<COMMIT_HASH>",
    "author": "<AUTHOR_NAME>",
    "authoremail": "<AUTHOR_EMAIL>",
    "committername": "<COMMITTER_NAME>",
    "committeremail": "<COMMITTER_EMAIL>",
    "commitdate": "<COMMIT_DATE>",
    "commitmessage": "<COMMIT_MESSAGE>",
    "status": {
      "<FILE_PATH>": {
        "worktree": "<WORKTREE_STATUS>",
        "staging": "<STAGING_STATUS>"
      },
      ...
    },
    "commitdigest": {
      "<DIGEST_ALGORITHM>": "<DIGEST_HEX_VALUE>"
    },
    "signature": "<PGP_SIGNATURE>",
    "
    "parenthashes": [
      "<PARENT_COMMIT_HASH_1>",
      "<PARENT_COMMIT_HASH_2>",
      ...
    ],
    "treehash": "<TREE_HASH>",
    "refs": [
      "<REFERENCE_1>",
      "<REFERENCE_2>",
      ...
    ],
    "tags": [
      {
        "name": "<TAG_NAME>",
        "taggername": "<TAGGER_NAME>",
        "taggeremail": "<TAGGER_EMAIL>",
        "when": "<TAG_CREATION_TIME>",
        "pgpsignature": "<PGP_SIGNATURE>",
        "message": "<TAG_MESSAGE>"
      },
      ...
    ],
    "changedFiles": {
      "<FILE_PATH_1>": {
        "digest": { "<ALGORITHM>": "<HEX_VALUE>", ... }
      },
      "<FILE_PATH_2>": {
        "digest": { "<ALGORITHM>": "<HEX_VALUE>", ... }
      },
      ...
    }
  }
}