Open marshall007 opened 2 months ago
Currently Zarf has prioritized an agnostic format for SBOMs to capture the maximum amount of data that Syft (the tool Zarf uses under the hood) can give Zarf. The Syft JSON files can be downconverted to other formats and conversion is covered in the latter half of this docs section: https://docs.zarf.dev/ref/sboms/#extracting-a-packages-sbom
For the tooling/version used are you looking to see Zarf or Syft?
As of v0.41.0, the Syft json has .descriptor.name
and .descriptor.version
, which evaluate to Zarf and Zarf version respectively. Additionally, under the .schema
field there's the schema version of the Syft json.
Thanks guys, the Syft JSON makes sense. I'm sold.
For the tooling/version used are you looking to see Zarf or Syft?
I think I'd expect to see Syft, but maybe this is not so important afterall. I'm still looking into it but maybe all that matters is the schema version. I need to see if different syft
versions produce different results (outside of schema version changes).
As of v0.41.0, the Syft json has .descriptor.name and .descriptor.version, which evaluate to Zarf and Zarf version respectively.
I see these fields, but Zarf is failing to populate .descriptor.version
in the SBOMs I've looked at so far.
Another thing I discovered today is that Zarf is not preserving the original manifest digests in the generated SBOM. Here is the diff between the .source
section of an SBOM in the Zarf package vs what I get from scanning with syft
directly:
This is good to know thanks for doing this analysis. The media types changing makes sense. I'm not sure why syft writes the mediatypes of layers in that format, even when the manifest in the registry is already using the newer vnd.oci.image.layer.v1
.
Losing the architecture, manifest and manifest digests is a bit concerning though. The differences likely have to do with the fact that we're using the equivalent of syft scan oci-dir
under the hood. It looks like it's missing a lot of information that should be in the image manifest in Zarf, however it does have the layers, which it must get from the manifest.
@AustinAbro321 something we're wrestling with on the sec-hub implementation is what to do when multiple Zarf packages include the same container images, but with slightly different SBOMs. We're noticing that syft
does not really guarantee deterministic output even for the same syft-json
schema version.
A good example is comparing these two SBOMs between 10.6.0-uds.0-upstream
and 10.6.0-uds.1-upstream
(which, by definition, include identical images):
# download artifacts
crane blob ghcr.io/defenseunicorns/packages/uds/sonarqube:10.6.0-uds.0-upstream@sha256:3c3c927030c26b05efa1c0504ed79722f747d46d4b671bd99c870ad5f7d72e42 | tar -Ox docker.io_library_sonarqube_10.6.0-community.json > sonarqube-uds.0.json
crane blob ghcr.io/defenseunicorns/packages/uds/sonarqube:10.6.0-uds.1-upstream@sha256:d0e3ff0e4e26779571a30e0633cede9e6692179e817c573da320fc09d6a0fcea | tar -Ox docker.io_library_sonarqube_10.6.0-community.json > sonarqube-uds.1.json
# structural diff
jd -color -mset -setkeys "artifacts.purl" sonarqube-uds.{0..1}.json
I thought that this was the result of bumping the syft dependency, but it turns out both packages were built using the same Zarf version (v0.36.1
).
tl;dr: I think this is all good evidence suggesting we should store SBOMs as attestations and not bundle them with the Zarf package. We should be periodically rescanning packages so we can provide richer SBOMs (more metadata, up-to-date syft-json
schema version, new/improved catalogers, etc).
Describe what should be investigated or refactored
Currently the
sboms.tar
layer contains both JSON documents and generated HTML for an "SBOM viewer" page for each of the images in the Zarf package. The current approach has several downsides:For comparison, compressed tarballs that contain only the JSON documents are <10x the size:
Proposed solution
adopt standard SPDX JSON format