opensbom-generator / spdx-sbom-generator

Support CI generation of SBOMs via golang tooling.
394 stars 109 forks source link

Calculate sha of artifact contents, not name #210

Open loosebazooka opened 3 years ago

loosebazooka commented 3 years ago

Summary

https://github.com/spdx/spdx-sbom-generator/blob/main/pkg/modules/javamaven/decoder.go#L173

the artifact hash should not be on the name of the module. For example the artifact com.google.guava:guava:30.1.1-jre should have a sha1 of https://repo1.maven.org/maven2/com/google/guava/guava/30.1.1-jre/guava-30.1.1-jre-javadoc.jar.sha1

imjasonh commented 3 years ago

The same is happening for npm: https://github.com/spdx/spdx-sbom-generator/blob/b72c5287eb8ad5b2dfab6f02bed3305e81527707/pkg/modules/npm/handler.go#L194-L198

And yarn: https://github.com/spdx/spdx-sbom-generator/blob/b72c5287eb8ad5b2dfab6f02bed3305e81527707/pkg/modules/yarn/handler.go#L234-L238

And nuget: https://github.com/spdx/spdx-sbom-generator/blob/90ec05b20557e3cda6fd12cf214ad02b83c02f87/pkg/modules/nuget/handler.go#L162-L165

imjasonh commented 3 years ago

It seems the cargo module also has this bug, via the readCheckSum helper, that just SHA-1s the string contents it's given. In the cargo module this is just called with the dep ID, instead of any sort of actual module contents:

https://github.com/spdx/spdx-sbom-generator/blob/b72c5287eb8ad5b2dfab6f02bed3305e81527707/pkg/modules/cargo/modules.go#L103

The composer module also has a copy of the readCheckSum helper, which gets called with the packageUrl and stuffed into the module checksum:

https://github.com/spdx/spdx-sbom-generator/blob/b72c5287eb8ad5b2dfab6f02bed3305e81527707/pkg/modules/composer/modules.go#L58


This seems like a pretty egregious bug to me, since people might see the module checksum value and come to believe it represents actual content digests, instead of just the digest of a string they already have. When verifying the sbom, the actual module content digest won't match the module name digest reported in the sbom.

Even worse, if someone manages to push new module contents with the same name, the checksum reported in the sbom will remain the same and consumers of the sbom will be none the wiser.

This issue seems to be endemic to this codebase. I'd suggest taking a closer look at how these checksums are collected and reported, and even better having some verification or testing framework in place to ensure that the checksum reported matches the module contents. If sboms can't be reliably reported and verified, they're not going to be of much practical use to anybody.