guacsec / guac

GUAC aggregates software security metadata into a high fidelity graph database.
https://guac.sh
Apache License 2.0
1.25k stars 164 forks source link

[ingestion/data-quality issue] CycloneDX specification as of 1.4 does not require the component field in metadata #976

Open pxp928 opened 1 year ago

pxp928 commented 1 year ago

Describe the bug

Discovered by kurt-r2c in original PR https://github.com/guacsec/guac/pull/896:

The CycloneDX specification as of 1.4 does not require the component field in metadata, nor does it require the BOM ref field: https://cyclonedx.org/docs/1.4/json/#metadata_component_bom-ref.

guac was encountering a NPE when trying to read these uninitialized fields from the CycloneDX struct.

Some SBOM generators that only operate on package lockfiles (such as https://pypi.org/project/cyclonedx-bom/) do not generate the component metadata field, nor are they required to by the spec.

To Reproduce

ingestion of an SBOM that is generated by SBOM generators that only operate on package lockfiles (such as https://pypi.org/project/cyclonedx-bom/) that do not generate the component metadata field.

Expected behavior

Currently, output an error message that this is not currently supported as there would be no top-level package node and would result in a bunch of singleton packages. Further discussion and handling will need to be done to handle this use-case.

GUAC version v0.1

lumjjb commented 1 year ago

@kurt-r2c wanted to check back here, want to see what other use cases we need to handle here that stemmed from discussion of this issue.

joestein commented 1 year ago

What more specifically needs to be added to the "component field in metadata" to make a CycloneDX file work? Do you have an example of what will work so folks can add that to their CycloneDX output and still make it work with GUAC (even if the spec doesn't support it no biggie to overlay the field post output from existing generators).

pxp928 commented 1 year ago

Hey @joestein, at the minimum GUAC requires the following in the component field in the metadata: https://github.com/guacsec/guac/blob/e55fa2498e9b1851b8e0fdb5b95874774005824b/internal/testing/testdata/exampledata/alpine-cyclonedx.json#L15-L20

At the minimum, it needs the name, version, type, and bom-ref. If the purl is available, that would be the best as guac does not need to infer. Based on the CycloneDX spec the version and bom-ref are not required fields but provide a more accurate SBOM.

Without this information, GUAC cannot make proper relationships between the top-level component and its dependencies (resulting in a bunch of singletons).