tern-tools / tern

Tern is a software composition analysis tool and Python library that generates a Software Bill of Materials for container images and Dockerfiles. The SBOM that Tern generates will give you a layer-by-layer view of what's inside your container in a variety of formats including human-readable, JSON, HTML, SPDX and more.
BSD 2-Clause "Simplified" License
967 stars 188 forks source link

Add SPDX generation using spdx-tools #1233

Open armintaenzertng opened 1 year ago

armintaenzertng commented 1 year ago

This is set up to produce the same SPDX output as the current spdx generation module while utilising the spdx-tools library. The goal is to replace the current module with this new one, which will allow easy migration to more SPDX formats as well as SPDXv3. I tried to stay close to the structure of the original implementation.

I tested this using the following commands: tern report -i golang:1.12-alpine -f spdxjson -o spdx_test.json and tern report -i golang:1.12-alpine -f spdxjson_new -o new_spdx_test.json I compared the resulting json files using jd, treating arrays as unordered sets. The only differences were in timestamps, UUIDs, and two differences in how json output is generated:

armintaenzertng commented 1 year ago

I added support for YAML, XML and RDF-XML formats.

rnjudge commented 1 year ago

@armintaenzertng Thank you very much for all your work on this! How does one denote which version of SPDX documents they want using this PR? I am assuming this PR, by default, generates SPDX 2.3 documents. However, we can't drop support for SPDX 2.2 since we have users who want it because it is the ISO standard version. There needs to be a way to denote SPDX version to generate on the command line before we can merge this.

armintaenzertng commented 1 year ago

This PR currently replicates the behavior of the current state. That is, SPDX 2.2 is hardcoded into the output. I will gladly add support for multiple SPDX versions, but we should clarify how this would work in the current workflow.

armintaenzertng commented 1 year ago

After some further consideration and having a deeper look at the code, point 2 from above might be the better alternative after all. I'll try to implement that.

armintaenzertng commented 1 year ago

I added the versioning I described above. @rnjudge, please have a look if that is OK with you. :)

rnjudge commented 1 year ago

@armintaenzertng do you want to schedule a zoom call about this? I would like to avoid mass code duplication as that was the whole point of using the SPDX tools library.

armintaenzertng commented 1 year ago

Yes, certainly! :) Do you have my email address?

armintaenzertng commented 1 year ago

I added a CLI version parameter for the output format. Formats that don't support this (i.e. everything except for SPDX) will raise an error if this is set.

armintaenzertng commented 1 year ago

I added SPDX-2.3 functionality.

In particular, this means that if the SPDX version is 2.3, we set the primary_package_purpose of the container package to CONTAINER and omit concluded license, declared license and copyright text in SpdxPackages if possible.

armintaenzertng commented 1 year ago

@rnjudge: tern report -i photon:3.0 -f spdxjson -sv 2.3 -o output.json followed by pyspdxtools -i output.json yields no errors or invalidations, so I'll mark this "Ready for review" now. There are still some open issues regarding Spdx documents with file information, which I collected in #1240. As these existed in the old SPDX implementation already, I'd propose to fix them in a separate PR after this one to keep things clearer (this one is already quite large).

vargenau commented 1 year ago

This would fix https://github.com/tern-tools/tern/issues/1211

vargenau commented 1 year ago

@rnjudge @armintaenzertng For specifying the SPDX version of the output, I would propose the following:

tern report -f spdxtagvalue@2.3 -i golang:1.12-alpine -o alpine.spdx

and

tern report -f spdxjson@2.3 -i golang:1.12-alpine -o alpine.spdx.json

Advantages would be that we have one less argument and it is similar to the Syft syntax: https://github.com/anchore/syft/

spdx-tag-value: A tag-value formatted report conforming to the SPDX 2.3 specification
spdx-tag-value@2.2: A tag-value formatted report conforming to the SPDX 2.2 specification
spdx-json: A JSON report conforming to the SPDX 2.3 JSON Schema
spdx-json@2.2: A JSON report conforming to the SPDX 2.2 JSON Schema

That is what I had prototyped in https://github.com/tern-tools/tern/pull/1228

What do you think?

armintaenzertng commented 1 year ago

@vargenau: I believe the current implementation suggestion is a little more flexible in supporting more versions/formats in the future. Your proposed solution would result in five additional entrypoints (one per spdx format) per version.

Still, if necessary, the spdxjson@2.2 usecase can be easily implemented by adding a new entrypoint with the generator just calling get_spdx_from_image_list with the right parameters.

vargenau commented 1 year ago

Hi @armintaenzertng My proposal aim was:

vargenau commented 1 year ago

@rnjudge: tern report -i photon:3.0 -f spdxjson -sv 2.3 -o output.json followed by pyspdxtools -i output.json yields no errors or invalidations, so I'll mark this "Ready for review" now. There are still some open issues regarding Spdx documents with file information, which I collected in #1240. As these existed in the old SPDX implementation already, I'd propose to fix them in a separate PR after this one to keep things clearer (this one is already quite large).

Is is really -sv 2.3 ? In Rose message above, it was parser_report.add_argument('-fv', '--format-version',

armintaenzertng commented 1 year ago

I meant -sv, this was changed according to this comment.

vargenau commented 1 year ago

I meant -sv, this was changed according to this comment.

Thank you, I had missed that comment.

armintaenzertng commented 1 year ago

Small update: I changed the spdx-tools dependency to the new 0.8.1 version.

armintaenzertng commented 1 year ago

small fix: I changed the check for the SPDX version to use the official string "SPDX-2.2" consistently.