rust-secure-code / cargo-auditable

Make production Rust binaries auditable
Apache License 2.0
656 stars 30 forks source link

Investigate pre-exiting formats for storing dependency info #31

Open Shnatsel opened 2 years ago

Shnatsel commented 2 years ago

Apparently there is a number of formats designed to encode package info already: https://gitbom.dev/glossary/sbom/

We need to check if any of them are suitable for our use case. Notably we redact some field such as git repo URLs, and also include information about enabled features, so it might not be 100% compatible.

Also, the degree of adoption of these formats needs to be understood; perhaps we should provide conversion utilities, even if we don't end up using the format internally.

Shnatsel commented 2 years ago

Specifically, we need to understand:

  1. Does anyone actually use those SBOM formats?
  2. Are any of those formats a good fit for storing our data - perhaps we won't have to invent a custom format after all?
tofay commented 2 years ago

Dumping my notes on formats and SPDX here.


Suggested requirements for data format.

  1. Needs to be able to convey Rust crate runtime and build dependencies.
  2. Needs to be extensible to adding extra information we may want to add in the future, e.g statically linked C libraries, or build tool versions such as rustc?
  3. Needs to be easily interoperable with other tools. Parsable in Rust and other languages (in particular go as used by syft/trivy SCA tools). Needs to be easy for tools to correlate with vulnerability dbs (e.g Rustsec)

Trivy creator asked Embed CPE names into binaries · Issue #76 · ossf/wg-vulnerability-disclosures (github.com) Discussion loosely points to SBOM formats being more appropriate as a data format than package identification formats (SWID/PURL). In particular SBOM formats allow expressing the nature of relationships (e.g build/runtime dependency).

It was suggested on zulip that SPDX is likeliest SBOM format to reach wider adoption given it's backing by OpenSSF and industry.

There's currently no standardized way to embed SPDX SBOMs into binaries - Embedding SPDX into binaries · Issue #739 · spdx/spdx-spec (github.com).

Some concerns over embedding SPDX SBOMs are:

An example representing a binary as a SPDX File looks like


{
  "spdxVersion": "SPDX-2.2",
  "dataLicense": "CC0-1.0",
  "SPDXID": "SPDXRef-DOCUMENT",
  "name": "baz.spdx.json",
  "documentNamespace": "https://foo.bar/",
  "creationInfo": {
    "created": "2022-08-01T18:44:38Z",
    "creators": [
      "Tool: cargo-spdx 0.1.0"
    ]
  },
  "packages": [
    {
      "copyrightText": "NOASSERTION",
      "downloadLocation": "NOASSERTION",
      "externalRefs": [
        {
          "referenceCategory": "PACKAGE_MANAGER",
          "referenceLocator": "pkg:cargo/bar@0.1.0",
          "referenceType": "purl"
        }
      ],
      "licenseConcluded": "NOASSERTION",
      "licenseDeclared": "NOASSERTION",
      "name": "bar",
      "SPDXID": "SPDXRef-bar-0.1.0",
      "versionInfo": "0.1.0"
    },
    {
      "copyrightText": "NOASSERTION",
      "downloadLocation": "NOASSERTION",
      "externalRefs": [
        {
          "referenceCategory": "PACKAGE_MANAGER",
          "referenceLocator": "pkg:cargo/baz@0.1.0",
          "referenceType": "purl"
        }
      ],
      "licenseConcluded": "NOASSERTION",
      "licenseDeclared": "NOASSERTION",
      "name": "baz",
      "SPDXID": "SPDXRef-baz-0.1.0",
      "versionInfo": "0.1.0"
    },
    {
      "copyrightText": "NOASSERTION",
      "downloadLocation": "NOASSERTION",
      "externalRefs": [
        {
          "referenceCategory": "PACKAGE_MANAGER",
          "referenceLocator": "pkg:cargo/foo@0.1.0",
          "referenceType": "purl"
        }
      ],
      "licenseConcluded": "NOASSERTION",
      "licenseDeclared": "NOASSERTION",
      "name": "foo",
      "SPDXID": "SPDXRef-foo-0.1.0",
      "versionInfo": "0.1.0"
    }
  ],
  "files": [
    {
      "checksums": [
        {
          "algorithm": "SHA1",
          "checksumValue": "da39a3ee5e6b4b0d3255bfef95601890afd80709"
        }
      ],
      "copyrightText": "NOASSERTION",
      "fileName": "baz",
      "fileTypes": [
        "BINARY"
      ],
      "licenseConcluded": "NOASSERTION",
      "SPDXID": "SPDXRef-File-baz"
    }
  ],
  "relationships": [
    {
      "relatedSpdxElement": "SPDXRef-baz-0.1.0",
      "relationshipType": "GENERATED_FROM",
      "spdxElementId": "SPDXRef-File-baz"
    },
    {
      "relatedSpdxElement": "SPDXRef-bar-0.1.0",
      "relationshipType": "DEPENDS_ON",
      "spdxElementId": "SPDXRef-File-baz"
    },
    {
      "relatedSpdxElement": "SPDXRef-baz-0.1.0",
      "relationshipType": "DEPENDS_ON",
      "spdxElementId": "SPDXRef-File-baz"
    },
    {
      "relatedSpdxElement": "SPDXRef-foo-0.1.0",
      "relationshipType": "DEPENDS_ON",
      "spdxElementId": "SPDXRef-File-baz"
    }
  ]
}

Rust support for SPDX SBOM format:

More questions to consider regarding use of SPDX in cargo-auditable:

Does it actually make it easier to use the embedded data?

Is it worth using a different format at all without a resolution to Embed CPE names into binaries · Issue #76 · ossf/wg-vulnerability-disclosures (github.com)

tofay commented 2 years ago

Re "does anyone actually use these format", both trivy and grype (the vulnerability scanning tool that works with/uses syft) are capable of reading SBOMs in multiple formats, e.g SPDX/cyclonedx.

If there was a standardized section name for embedding SBOMs then cargo-auditable could use that and these tools could be updated to detect that. And without section name standardization, cargo-auditable could use SPDX, and go-rustaudit could extract the SBOM and expose the JSON for these tools to parse with their existing parsers.

orangecms commented 2 years ago

Hi, I just heard from you on the Rustacean Station podcast - really cool stuff here! :-)

I've been thinking, talking and exchanging about this whole topic here for a while now, so let me add some references:

When I asked who else would be interested in the topic, I was invited to the CycloneDX Slack, where people discuss the entire SBoM topic very broadly. Maybe that's also for you. :-)

Finally, I am quite involved in the oreboot firmware project, where I'm seeking to introduce SBoM as well, likely based on CycloneDX, for which there is also a Rust implementation.

That shall be it for now; feel free to poke back at me should you have any further questions etc.. :partying_face:

Shnatsel commented 2 years ago

Thanks for the links! Having SBOMs in firmware would certainly be cool!

So far I've found everything not specifically designed for inclusion into binaries unsuitable, for two reasons:

  1. Inclusion of dates messes up reproducible builds
  2. The formats are very verbose and/or require including lots of information that is not relevant for the purposes of a security audit, increasing the binary size considerably.

I'm looking to talk to some people who have worked on the SBOM embedded in Go binaries by default. They also rolled their own JSON-based format, and perhaps we could collaborate on something more generic or at least that could be shared between the two.

FWIW Syft can already convert from the cargo auditable data format to CycloneDX.

jayvdb commented 1 year ago

https://github.com/google/osv-scanner supports "SPDX and CycloneDX SBOMs using Package URLs" - https://google.github.io/osv-scanner/usage/#specify-sbom

As an alternative/pre-cursor for storing the dependency info in those SBOM formats, perhaps rust-audit-info could extract the existing format and do a "rough" conversion to these SBOM formats, so that integration with these other tools can be explored, determining what (if any) extra fields need to be stored in the rust binaries in order to get reasonable compatibility with these tools.

Shnatsel commented 1 year ago

Syft can already perform such a conversion today.

Shnatsel commented 4 months ago

I've prototyped recording CycloneDX in the binaries directly, you can find the code in this branch: https://github.com/rust-secure-code/cargo-auditable/tree/record-cyclonedx

This was made possible by newer CycloneDX versions that no longer require a date and serial number to be present, which enables them to be made reproducible.

Recording CycloneDX results in 2x the overhead compared to the custom format. But the overhead is still consistently below 1/1000th of the size of the binary across a wide range of projects, so this is probably acceptable.

I've also built a pure-Rust converter from the custom format to CycloneDX, so that anyone who needs the conversion would not need to pull in the entirety of Syft: https://github.com/rust-secure-code/cargo-auditable/tree/master/auditable2cdx