tern-tools / tern

Tern is a software composition analysis tool and Python library that generates a Software Bill of Materials for container images and Dockerfiles. The SBOM that Tern generates will give you a layer-by-layer view of what's inside your container in a variety of formats including human-readable, JSON, HTML, SPDX and more.
BSD 2-Clause "Simplified" License
967 stars 188 forks source link

Update license data to show actual license type instead of license ref #1159

Closed ivanayov closed 2 years ago

ivanayov commented 2 years ago

Previously the PackageLicenseDeclared and licenseDeclared data for spdxtagvalue and spdxjson respectively were set to license reference of type LicenseRef-df8cb33 which is not informative. This change updates that data to the actual license info, f.e. MIT, in case a license is declared, or the LicenseRef-df8cb33 value if it's not

Resolves #1147

rnjudge commented 2 years ago

Thanks again for the PR, @ivanayov :) One other change that I think we will need to make is in the part of spdx document creation where the list of license refs gets generated. If we use the SPDX license expression instead of using a licenseref, we don't need to include the license ref extracted text at the end of the document.

This set of license ref/extracted text gets created in get_image_packages_license_block() in tern/formats/spdx/spdxtagvalue/image_helpers.py and get_image_extracted_licenses() in tern/formats/spdx/spdxjson/image_helpers.py. Something like this would due for the change (assuming is_spdx_license_expression gets moved to spdx_common.py per suggestion above):

--- a/tern/formats/spdx/spdxtagvalue/image_helpers.py
+++ b/tern/formats/spdx/spdxtagvalue/image_helpers.py
@@ -57,9 +57,10 @@ def get_image_packages_license_block(image_obj):
             if package.pkg_license:
                 licenses.add(package.pkg_license)
     for lic in licenses:
-        block += spdx_formats.license_id.format(
-            license_ref=spdx_common.get_license_ref(lic)) + '\n'
-        block += spdx_formats.extracted_text.format(orig_license=lic) + '\n'
+        if not spdx_common.is_spdx_license_expression(lic):
+            block += spdx_formats.license_id.format(
+                license_ref=spdx_common.get_license_ref(lic)) + '\n'
+            block += spdx_formats.extracted_text.format(orig_license=lic) + '\n'
     return block

--- a/tern/formats/spdx/spdxjson/image_helpers.py
+++ b/tern/formats/spdx/spdxjson/image_helpers.py
@@ -31,9 +31,10 @@ def get_image_extracted_licenses(image_obj):
                 unique_licenses.add(package.pkg_license)
     extracted_texts = []
     for lic in list(unique_licenses):
-        extracted_texts.append(json_formats.get_extracted_text_dict(
-            extracted_text=lic, license_ref=spdx_common.get_license_ref(
-                lic)))
+        if not spdx_common.is_spdx_license_expression(lic):
+            extracted_texts.append(json_formats.get_extracted_text_dict(
+                extracted_text=lic, license_ref=spdx_common.get_license_ref(
+                    lic)))
     return extracted_texts