oss-review-toolkit / ort

A suite of tools to automate software compliance checks.
https://oss-review-toolkit.org
Apache License 2.0
1.56k stars 306 forks source link

Python: Integrate original LICENSE (shipped by for example wheel) to cyclonedx #7467

Open andife opened 1 year ago

andife commented 1 year ago

Hi, I want to create a cycloneDX file based on Python packages. The file must contain the original delivered license texts. e.g. for tqdm the following: https://raw.githubusercontent.com/tqdm/tqdm/master/LICENCE Currently the corresponding field seems to contain only the general template?

If this function should not be available? or is it possible to include the information i.e. the license text in e.g. curations/PyPI/idna.yml? If I want to have a manual check in between?

tsteenbe commented 12 months ago

Hi @andife, the short answer ORT actually has built-in functionality to capture original licensing files for each packages but this functionality is not yet implemented in CycloneDx or SPDX SBOM generation, only used in PlainTextTemplate Reporter to generate NOTICE files. We simply haven't had time to implement this and SBOM specifications don't really accommodate this very common use case of wanting to include the original license files. Both CycloneDX and SPDX do not support to my knowledge the inclusion of per package different license text for MIT license (saying this as SPDX contributor myself). Both specs only support inclusion for custom license ids (LicenseRefs) not on the official SPDX list and linking to original license test using a externalRef tp point to https://raw.githubusercontent.com/tqdm/tqdm/master/LICENCE. We welcome contributions in space - also happy to answer any SBOM questions you may have.

The workaround most ORT users use around this shortcoming of the SBOM format is to generate a NOTICE file with the PlainTextTemplate reporter and include this file in the project deliverables (docs or archives).

How inclusion of original license files in plain text reports works:

  1. Configure ORT to archive specified files as licenses. Configurable via licenseFilenames in the config.yml (see for reference config.yml https://github.com/oss-review-toolkit/ort/blob/main/model/src/main/resources/reference.yml#L35).

By default the following files will be captured - note values are case insensitive

licenseFilenames = listOf(
    "copying*",
    "copyright",
    "licence*",
    "license*",
    "*.licence",
    "*.license",
    "unlicence",
    "unlicense"
),
patentFilenames = listOf(
    "patents"
),
rootLicenseFilenames = listOf(
    "readme*"
)
  1. Run the ORT scanner with configured but empty scan cache for the package for which you want to include original license file
  2. Run PlainTextTemplate reporter as shown in https://github.com/oss-review-toolkit/ort/blob/main/docs/reporters/plain-text-templates.md with NOTICE_BY_PACKAGE.ftl and you should see text files as output containing the original license texts
andife commented 12 months ago

Thank you for pointing out a possible path. I have to have a closer look to your way.

In my opinion there would be exactly one field in CycloneDX where you could put the license text. https://cyclonedx.org/docs/1.4/json/#components_items_licenses_items_license_text_content

In "https://cyclonedx.org/docs/1.4/json/#components_items_licenses_items_license_text" its mentioned: "An optional way to include the textual content of a license."

sschuberth commented 12 months ago

Both CycloneDX and SPDX do not support to my knowledge the inclusion of per package different license text for MIT license

CycloneDX actually does support that, and like @andife correctly assumed we're already making use of that feature for the general license texts as can be seen in our test data, e.g.

https://github.com/oss-review-toolkit/ort/blob/3fac5823246b45819cd899f8af258938e4413158/plugins/reporters/cyclonedx/src/funTest/assets/cyclonedx-reporter-expected-result.json#L36-L40

https://github.com/oss-review-toolkit/ort/blob/3fac5823246b45819cd899f8af258938e4413158/plugins/reporters/cyclonedx/src/funTest/assets/cyclonedx-reporter-expected-result.json#L96-L100

So the license text for MIT is added multiple times already, once per component / package. It definitely would make sense to by default use the actual license text from the source code of the component / package if it's available in ORT's archive.