spdx / cdx2spdx

Utility that converts SBOM documents from CycloneDX to SPDX
Apache License 2.0
29 stars 9 forks source link

Hierarchical sbom component information #42

Open flemminglau opened 6 months ago

flemminglau commented 6 months ago

This is really not an issue specific to this tool but in case the tool was to implement a way of doing this it would be a great contribution to the versatility of both CDX and SPDX files.

We have the challenge that the SBOMs do not have a good/agreed way of defining

I am aware that the distinction is an "overlay" to the concept of an SBOM but never the less it is a very relevant distinction.

It is very possible that some existing fields can be used for this but it currently seems not.

Currently a convention has been set up in at least one of our systems to use the SPDX .comment field to indicate which product component (which part of the overall product scope) this SPDX defines. However my argument against this is that it leaves you with only a single component per SBOM. This is then solved by allowing a product to be defined by a zipped set of SPDX files. Something which I believe is non standard?

Does anyone have a good idea how this can be solved?

I have tried defining the SBOMs as hierarchical merges of the component SBOMs. However our systems tend to assume that the top level in the dependency tree represents the direct dependencies and any lower layers are transitive. Adding a hierarchy of product components messes up this assumption and everything OSS becomes transitive.

We need some flag/convention that explicitly identifies modules as being one or the other.

flemminglau commented 6 months ago

Actually this request is linked with https://github.com/spdx/spdx-spec/issues/875

I am wording it different but we are basically targeting the same general need for being able to express in the SPDX which modules in the dependency tree are "internal components of the application", which are "direct dependencies on external components" and which are "transitive dependencies on external components".

Or as it is worded in the 875 issue "which direct / transient dependencies are in the dependency hierarchy of a specific product component".

flemminglau commented 6 months ago

Revisiting this maybe the functionality I am looking for is covered (or can be and should be) by the CycloneDX .components.type field. I.e. so that only type="library" are considered when defining dependencies as being "direct" or "transitive". Anything else should be considered a "component".

goneall commented 6 months ago

I'll attempt to answer some of the questions, but I may not be completely understanding the scenario and context:

This is really not an issue specific to this tool but in case the tool was to implement a way of doing this it would be a great contribution to the versatility of both CDX and SPDX files.

We have the challenge that the SBOMs do not have a good/agreed way of defining

  • What is an internal component in a product

By this do you mean what is shipped with the product? If so, then the SPDX convention is to have a "CONTAINS" relationship from the product to the internal component

  • What is a direct dependency to an OSS component

There isn't a field for this, but it can be derived by any component which has a dependency type relationship (e.g. DEPENDENCY_OF, STATIC_LINK, DYNAMIC_LINK) would be a direct dependency

  • What is a transitive dependency to an OSS component

This can be derived by traversing the relationships of any direct dependencies

I am aware that the distinction is an "overlay" to the concept of an SBOM but never the less it is a very relevant distinction.

It is very possible that some existing fields can be used for this but it currently seems not.

Currently a convention has been set up in at least one of our systems to use the SPDX .comment field to indicate which product component (which part of the overall product scope) this SPDX defines. However my argument against this is that it leaves you with only a single component per SBOM. This is then solved by allowing a product to be defined by a zipped set of SPDX files. Something which I believe is non standard?

The SBOM should contain a package for every dependency and a Relationship to connect the dependency to the package the SBOM is about.

Does anyone have a good idea how this can be solved?

I have tried defining the SBOMs as hierarchical merges of the component SBOMs. However our systems tend to assume that the top level in the dependency tree represents the direct dependencies and any lower layers are transitive. Adding a hierarchy of product components messes up this assumption and everything OSS becomes transitive.

I'm not quite following the issue - if the relationship fields are used, you should be able to keep the hierarchy information precise and accurate.

We need some flag/convention that explicitly identifies modules as being one or the other.

flemminglau commented 6 months ago

I see your points but remember that I am limited by the tools available. I cannot simply (manually) inject or modify information into the SBOM files. That would be unworkable when working with dozens of products holding hundreds of product components and thousands of OSS modules.

The tools available (like cyclonedx-cli and cdxasm) can merge sboms hierarchically. But they do that simply by making the original sbom top components members (top level) of the dependency tree (which is all converted to DEPENDENCY_OF in SPDX).

I am now following the path of possibly assuming that the distinction between these 3 types can be deduced by inspecting the "primaryPackagePurpose" field. Assuming that only "LIBARARY" and possibly "FRAMEWORK" are candidates for being direct and transitive OSS dependencies. Any other objects in the SBOM ("CONTAINER", "APPLICATION", ...) would not quality for being dependencies on OSS modules.

This convention would require no changes to any tools. Only that we can trust the value of primaryPackagePurpose which we so some extent can control.

Any comments to this approach is most welcome.

goneall commented 6 months ago

I'm having difficulty understanding your complete scenario and tool constraints, so I may not be much help beyond my previous comments. Perhaps this is something we could discuss real-time in one of the SPDX implementers call. I'll be 30 minutes late for next weeks call and I'll miss the following call in 3 weeks, but I'll be back in the calls after that.

cc: @rnjudge - if you have any additional thoughts on the above issue.