whitesource-ps / ws-sbom-generator

WS SBOM Report Generator in SPDX or CycloneDX format
Apache License 2.0
32 stars 7 forks source link

[BUG] [ws-sbom-generator] "copyrightText" field not following SPSX 2.2 spec #154

Open olhado opened 1 year ago

olhado commented 1 year ago

Bug Description
The code in the generator is generating an array of copyrightText entries, when the spec defines it as a free-form text field:

(See: https://spdx.github.io/spdx-spec/v2-draft/package-information/#717-copyright-text-field) Identify the copyright holders of the package, as well as any dates present. This will be a free form text field extracted from package information files.

This leads to issues with the supplier/originator fields too, as the generator picks one of the array items, chops off anything with Copyright and digits for years when populating these fields. One example I have is Copyright and licenses, which leads to the originator/supplier fields to be set to and licenses.

Steps to Reproduce
Steps to reproduce the behavior:

  1. Run the generator against a project with the package in the example below
  2. See the behavior described above

Expected Behavior
Generator should follow the spec.

Additional Context
Example Package definition showing the behaviors described above:

{
    "SPDXID": "SPDXRef-PACKAGE-github.com-docker-go-events-v0.0.0-20190806004212-e31b211e4f1c",
    "name": "github.com/docker/go-events",
    "downloadLocation": "https://proxy.golang.org/github.com/docker/go-events/@v/v0.0.0-20190806004212-e31b211e4f1c.zip",
    "licenseConcluded": "Apache-2.0",
    "licenseInfoFromFiles": [
        "Apache-2.0"
    ],
    "licenseDeclared": "Apache-2.0",
    "copyrightText": "['Copyright 2016 Docker, Inc', 'Copyright 2016 Docker, Inc. go-events is licensed under the Apache License,', 'Copyright and license']",
    "versionInfo": "v0.0.0-20190806004212-e31b211e4f1c",
    "packageFileName": "github.com/docker/go-events-v0.0.0-20190806004212-e31b211e4f1c",
    "supplier": "Organization: and license (NOASSERTION)",
    "originator": "Organization: and license (NOASSERTION)",
    "checksums": [
        {
            "algorithm": "SHA1",
            "checksumValue": "469693c699269b710588646b59a0fd3d2c66e881"
        }
    ],
    "homepage": "https://proxy.golang.org/github.com/docker/go-events/@v/v0.0.0-20190806004212-e31b211e4f1c.zip",
    "filesAnalyzed": false
}
DimarrWS commented 1 year ago

Hi @olhado ! If I understand you correctly need to change the originator/supplier fields. I suppose it should be "Organization: Docker, Inc. (NOASSERTION)" Please, correct me if I'm wrong.

olhado commented 1 year ago

You can't add NOASSERTION if you are also defining an originator/supplier (see https://spdx.github.io/spdx-spec/v2-draft/package-information/#75-package-supplier-field and https://spdx.github.io/spdx-spec/v2-draft/package-information/#76-package-originator-field):

Single line of text with one of the following:
NOASSERTION
Person: person name and optional (<email>)
Organization: organization name and optional (<email>)
DimarrWS commented 1 year ago

Ok, thank you, @olhado . In this example, you should see "Organization: Docker, Inc. ". Correct?

olhado commented 1 year ago

That would be valid, but that means the tool is asserting the originator/supplier. You have to be confident in the algorithm you are using to determine the correct values.

Ideally, I assume the SPDX spec is expecting any generator to scan the dependencies to find SPDX tags. Not sure what SPDX allows tools to do to infer originator/supplier, or even license ids/copyright text.

DimarrWS commented 1 year ago

Hi @olhado ! Please, take the new version of the tool. You are right: precisely defining the originator/supplier in the case of a few copyrights is impossible.