tern-tools / tern

Tern is a software composition analysis tool and Python library that generates a Software Bill of Materials for container images and Dockerfiles. The SBOM that Tern generates will give you a layer-by-layer view of what's inside your container in a variety of formats including human-readable, JSON, HTML, SPDX and more.
BSD 2-Clause "Simplified" License
967 stars 188 forks source link

Invalid file information in SPDX documents #1240

Open armintaenzertng opened 1 year ago

armintaenzertng commented 1 year ago

Note: This uses the new version of the SPDX generation introduced in #1233. The old version sports the same errors and a few more that have been already fixed in the new version.

Describe the bug SPDX outputs with file information have a number of validation issues:

To Reproduce I used tern report -i golang:1.12-alpine -f spdxjson -sv 2.3 -o output.json to produce the output and then ran pyspdxtools -i output.json on it (note that the validation takes a while due to large SPDX document). I'm not sure whether -x scancode would also be required as I recall that the above command used to not produce any file information before. In case there are problems, I attached my output.json as output.txt (JSON format is not supported by GitHub, it seems).

Error in terminal Here are the validation issues:

Unrecognized license reference: LicenseRef-21495e9. license_expression must only use IDs from the license list or extracted licensing info, but is: LicenseRef-21495e9
Unrecognized license reference: LicenseRef-1c734cf. license_expression must only use IDs from the license list or extracted licensing info, but is: LicenseRef-1c734cf
Unrecognized license reference: LicenseRef-1b79b75. license_expression must only use IDs from the license list or extracted licensing info, but is: LicenseRef-1b79b75
Unrecognized license reference: LicenseRef-fa9fd02. license_expression must only use IDs from the license list or extracted licensing info, but is: LicenseRef-fa9fd02
Unrecognized license reference: LicenseRef-39c3ee0. license_expression must only use IDs from the license list or extracted licensing info, but is: LicenseRef-39c3ee0
Unrecognized license reference: LicenseRef-21495e9. license_expression must only use IDs from the license list or extracted licensing info, but is: LicenseRef-21495e9
Unrecognized license reference: LicenseRef-4ccf56f. license_expression must only use IDs from the license list or extracted licensing info, but is: LicenseRef-4ccf56f
Unrecognized license reference: LicenseRef-45c771b. license_expression must only use IDs from the license list or extracted licensing info, but is: LicenseRef-45c771b
Unrecognized license reference: LicenseRef-ca2312b. license_expression must only use IDs from the license list or extracted licensing info, but is: LicenseRef-ca2312b
spdx_id must only contain letters, numbers, "." and "-" and must begin with "SPDXRef-", but is: SPDXRef-v2"-None
spdx_id must only contain letters, numbers, "." and "-" and must begin with "SPDXRef-", but is: SPDXRef-v2"-None
did not find the referenced spdx_id "SPDXRef-None-None" in the SPDX document

Expected behavior Tern's generated SPDX documents with file information should be valid.

Environment you are running Tern on Enter all that apply

rnjudge commented 1 year ago

Hmm, I don't see these errors in the current/old version of Tern's output when I run tern report -i golang:1.12-alpine -f spdxjson -o output.json:

(ternenv) [rose@fedora tern]$ tern report -i golang:1.12-alpine -f spdxjson -o output-golang.json
(ternenv) [rose@fedora ternenv]$ java -jar tools-java-1.1.7-jar-with-dependencies.jar Verify tern/output-golang.json 
WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance.
This SPDX Document is valid.
rnjudge commented 1 year ago

I'll take a look at the output you pasted and the output I have, but this seems to be introduced with the latest changes.

armintaenzertng commented 1 year ago

The java-tools don't seem to pick up on all invalidities, please also check with pyspdxtools -i output.json.

Also, do you get the large (around 6MB) SPDX output?

rnjudge commented 1 year ago

@armintaenzertng Yes, I do see errors with pyspdxtools although I'm not convinced all of them are valid or make sense. I tend to trust java tools more because it is actively maintained by @goneall and I'm not sure if the python tools are. But, if you see something that is valid that the java tools don't pick it up, you should file a bug with them.

As an example, I see this error with python tools:

package must contain no elements if `files_analyzed` is False, but found [Relationship(spdx_element_id='SPDXRef-golang-1.12-alpine', relationship_type=<RelationshipType.CONTAINS: 6>, related_spdx_element_id='SPDXRef-53745f29fd'

SPDXRef-53745f29fd is a layer package in the document, not a file.

It is true that a package may contain no files if files_analyzed is false but it may still contain other packages. This error is the majority of what I'm seeing. I don't see the Unrecognized license errors you are seeing.

Full error output:

$ pyspdxtools -i /home/rose/ternenv/tern/output-golang.json ``` ERROR:root:The document is invalid. The following issues have been found: package must contain no elements if files_analyzed is False, but found [Relationship(spdx_element_id='SPDXRef-golang-1.12-alpine', relationship_type=, related_spdx_element_id='SPDXRef-53745f29fd', comment=None), Relationship(spdx_element_id='SPDXRef-golang-1.12-alpine', relationship_type=, related_spdx_element_id='SPDXRef-9c60a09a3e', comment=None), Relationship(spdx_element_id='SPDXRef-golang-1.12-alpine', relationship_type=, related_spdx_element_id='SPDXRef-1e34416158', comment=None), Relationship(spdx_element_id='SPDXRef-golang-1.12-alpine', relationship_type=, related_spdx_element_id='SPDXRef-ccad0a45fa', comment=None), Relationship(spdx_element_id='SPDXRef-golang-1.12-alpine', relationship_type=, related_spdx_element_id='SPDXRef-65e02ee814', comment=None)] package must contain no elements if files_analyzed is False, but found [Relationship(spdx_element_id='SPDXRef-53745f29fd', relationship_type=, related_spdx_element_id='SPDXRef-musl-1.1.24-r0', comment=None), Relationship(spdx_element_id='SPDXRef-53745f29fd', relationship_type=, related_spdx_element_id='SPDXRef-busybox-1.31.1-r9', comment=None), Relationship(spdx_element_id='SPDXRef-53745f29fd', relationship_type=, related_spdx_element_id='SPDXRef-alpine-baselayout-3.2.0-r3', comment=None), Relationship(spdx_element_id='SPDXRef-53745f29fd', relationship_type=, related_spdx_element_id='SPDXRef-alpine-keys-2.1-r2', comment=None), Relationship(spdx_element_id='SPDXRef-53745f29fd', relationship_type=, related_spdx_element_id='SPDXRef-libcrypto1.1-1.1.1d-r3', comment=None), Relationship(spdx_element_id='SPDXRef-53745f29fd', relationship_type=, related_spdx_element_id='SPDXRef-libssl1.1-1.1.1d-r3', comment=None), Relationship(spdx_element_id='SPDXRef-53745f29fd', relationship_type=, related_spdx_element_id='SPDXRef-ca-certificates-cacert-20191127-r0', comment=None), Relationship(spdx_element_id='SPDXRef-53745f29fd', relationship_type=, related_spdx_element_id='SPDXRef-libtls-standalone-2.9.1-r0', comment=None), Relationship(spdx_element_id='SPDXRef-53745f29fd', relationship_type=, related_spdx_element_id='SPDXRef-ssl-client-1.31.1-r9', comment=None), Relationship(spdx_element_id='SPDXRef-53745f29fd', relationship_type=, related_spdx_element_id='SPDXRef-zlib-1.2.11-r3', comment=None), Relationship(spdx_element_id='SPDXRef-53745f29fd', relationship_type=, related_spdx_element_id='SPDXRef-apk-tools-2.10.4-r3', comment=None), Relationship(spdx_element_id='SPDXRef-53745f29fd', relationship_type=, related_spdx_element_id='SPDXRef-scanelf-1.2.4-r0', comment=None), Relationship(spdx_element_id='SPDXRef-53745f29fd', relationship_type=, related_spdx_element_id='SPDXRef-musl-utils-1.1.24-r0', comment=None), Relationship(spdx_element_id='SPDXRef-53745f29fd', relationship_type=, related_spdx_element_id='SPDXRef-libc-utils-0.7.2-r0', comment=None)] package must contain no elements if files_analyzed is False, but found [Relationship(spdx_element_id='SPDXRef-9c60a09a3e', relationship_type=, related_spdx_element_id='SPDXRef-ca-certificates-20191127-r0', comment=None)] ```

Also, the output I have is only 25K... not 6MB. 6MB sounds like it contains file information? Maybe try to delete your cache and re-generate. I get 23K for output file when I run with the updated changes as well (no file info).

rnjudge commented 1 year ago

@armintaenzertng I will try to generate a file with golang:1.12-alpine using scancode and see if I can re-create the errors you are seeing.

rnjudge commented 1 year ago

Running with the old changes, my SBOM with scancode metadata is 3.3MB. Running with the new changes, when I generate a scancode SBOM, It is 6.0MB. So it seems like there is extra metadata in there somewhere....

I do see one of the errors you are talking about with the old changes, though, even with the java tools: Analysis exception processing SPDX file: No SPDX element found for SPDX ID SPDXRef-None-None

I'll take a look. I'm assuming its another issue related to Scancode's recent restructuring.

armintaenzertng commented 1 year ago

It is true that a package may contain no files if files_analyzed is false but it may still contain other packages. This error is the majority of what I'm seeing.

Yes, I noticed this bug, too. This is fixed in the current 0.8.0rc3 release (the spdx-tools PR also includes that fixed release already, please update your local code to get the change).

armintaenzertng commented 1 year ago

Running with the old changes, my SBOM with scancode metadata is 3.3MB. Running with the new changes, when I generate a scancode SBOM, It is 6.0MB. So it seems like there is extra metadata in there somewhere....

This is due to the hasFiles field being deprecated, see here. All SPDXIDs from the hasFiles property are now represented as relationships, which have more lines than just the SPDXID.

armintaenzertng commented 1 year ago

@rnjudge: It turns out the java-tools pick up on the invalidities mentioned above, but only after the Analysis exception processing SPDX file: No SPDX element found for SPDX ID SPDXRef-None-None is fixed.