sw360 / capycli

CaPyCLI - Python scripts for software license compliance automation with SW360
Other
13 stars 7 forks source link

CycloneDX sbom for Rust dependencies can't be properly parsed #81

Closed bjoernbusch closed 1 month ago

bjoernbusch commented 1 month ago

I have a Rust project, for which I produce a SBOM file with CycloneDX. The file is according to the spec Version 1.3. The SBOM parser of capycli has an issue with the licenses section, when it doesn't contain expression, but license, which according to the standard is valid (https://github.com/CycloneDX/specification/blob/master/schema/bom-1.3.schema.json#L578). Here is an example of a dependency:

{
      "type": "library",
      "bom-ref": "registry+https://github.com/rust-lang/crates.io-index#ring@0.17.8",
      "name": "ring",
      "version": "0.17.8",
      "description": "Safe, fast, small crypto using Rust.",
      "scope": "required",
      "hashes": [
        {
          "alg": "SHA-256",
          "content": "c17fa4cb658e3583423e915b9f3acc01cceaee1860e33d59ebae66adc3a2dc0d"
        }
      ],
      "licenses": [
        {
          "license": {
            "name": "Unknown",
            "text": {
              "encoding": "base64",
              "content": "truncatedbase64string"
            }
          }
        }
      ],
      "purl": "pkg:cargo/ring@0.17.8",
      "externalReferences": [
        {
          "type": "other",
          "url": "ring_core_0_17_8"
        },
        {
          "type": "vcs",
          "url": "https://github.com/briansmith/ring"
        }
      ]
    }

Have you encountered this before? Is this not supported? Yet?

When I remove the licenses block, everything else can be parsed fine.

gernot-h commented 1 month ago

@bjoernbusch, can you please provide more details what you try to do with this SBOM in CaPyCli - I guess "capycli bom map"? And also provide the error message / crash you encounter?

Having a complete example SBOM might also help for quicker analysis.

bjoernbusch commented 1 month ago

I did capycli bom show -i bom.cdx.json -X and then I get

2024-10-15 12:13:21,505:DEBUG:capycli: CycloneDX: reading component ring, 0.17.8
2024-10-15 12:13:21,505:TEXT:CaPyCLI: Error reading SBOM: CaPyCliException("Invalid CaPyCLI file: unhashable type: 'dict'")

The full file is attached. bom.cdx.json

gernot-h commented 1 month ago

The crash seems to happen in the SbomJsonParser when it tries to parse/store the encoded license text. Not sure what exactly goes wrong, yet.

  File "/home/gernot/checkout/capycli/capycli/main/cli.py", line 28, in main
    app.run(argv)
  File "/home/gernot/checkout/capycli/capycli/main/application.py", line 159, in run
    self._run(argv)
  File "/home/gernot/checkout/capycli/capycli/main/application.py", line 140, in _run
    handle_bom.run_bom_command(self.options)
  File "/home/gernot/checkout/capycli/capycli/bom/handle_bom.py", line 58, in run_bom_command
    app1.run(args)
  File "/home/gernot/checkout/capycli/capycli/bom/show_bom.py", line 79, in run
    bom = CaPyCliBom.read_sbom(args.inputfile)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/gernot/checkout/capycli/capycli/common/capycli_bom_support.py", line 664, in read_sbom
    parser = SbomJsonParser(content)
             ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/gernot/checkout/capycli/capycli/common/capycli_bom_support.py", line 69, in __init__
    component = self.read_component(component_entry)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/gernot/checkout/capycli/capycli/common/capycli_bom_support.py", line 269, in read_component
    return Component(
           ^^^^^^^^^^
  File "/home/gernot/.cache/pypoetry/virtualenvs/capycli-YDJxuEq2-py3.11/lib/python3.11/site-packages/cyclonedx/model/component.py", line 736, in __init__
    self.licenses = licenses or []  # type: ignore
    ^^^^^^^^^^^^^
  File "/home/gernot/.cache/pypoetry/virtualenvs/capycli-YDJxuEq2-py3.11/lib/python3.11/site-packages/cyclonedx/model/component.py", line 965, in licenses
    self._licenses = SortedSet(licenses)
                     ^^^^^^^^^^^^^^^^^^^
  File "/home/gernot/.cache/pypoetry/virtualenvs/capycli-YDJxuEq2-py3.11/lib/python3.11/site-packages/sortedcontainers/sortedset.py", line 168, in __init__
    self._update(iterable)
  File "/home/gernot/.cache/pypoetry/virtualenvs/capycli-YDJxuEq2-py3.11/lib/python3.11/site-packages/sortedcontainers/sortedset.py", line 682, in update
    values = set(chain(*iterables))
             ^^^^^^^^^^^^^^^^^^^^^^
  File "/home/gernot/.cache/pypoetry/virtualenvs/capycli-YDJxuEq2-py3.11/lib/python3.11/site-packages/cyclonedx/model/__init__.py", line 711, in __hash__
    return hash((self.license, self.expression))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/gernot/.cache/pypoetry/virtualenvs/capycli-YDJxuEq2-py3.11/lib/python3.11/site-packages/cyclonedx/model/__init__.py", line 639, in __hash__
    return hash((self.id, self.name, self.text, self.url))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/gernot/.cache/pypoetry/virtualenvs/capycli-YDJxuEq2-py3.11/lib/python3.11/site-packages/cyclonedx/model/__init__.py", line 259, in __hash__
    return hash((self.content, self.content_type, self.encoding))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: unhashable type: 'dict'

It tries to hash self.content which is a dictionary containing encoding and content - either CaPyCli somehow provides wrong attribute values or the hash(...) call is wrong here.

gernot-h commented 1 month ago

I'm not sure yet whether this is a bug in CaPyCli's SbomJsonParser or in the old cyclonedx-python-lib 3.1.5 we still use.

Given that there were major improvements regarding the license handling in cyclonedx-python-lib 5.0.0 and that newer versions implement a JSON parser, I wonder whether we still want to fix bugs in the existing code.

@t-graf, what's your opinion on this, any plans to update to a recent cyclonedx-python-lib and use its JSON parser instead of your current implementation?

gernot-h commented 1 month ago

Ok, I was still curious and the following change fixes the crash:

--- a/capycli/common/capycli_bom_support.py
+++ b/capycli/common/capycli_bom_support.py
@@ -16,6 +16,7 @@ from typing import Any, Dict, Iterable, List, Optional, Union

 from cyclonedx.model import (
     AttachedText,
+    Encoding,
     ExternalReference,
     ExternalReferenceType,
     HashAlgorithm,
@@ -160,7 +161,8 @@ class SbomJsonParser(BaseParser):
             return None

         text = param.get("text", None)
-        license_text = AttachedText(content=text) if text else None
+        encoding = Encoding(text['encoding']) if text else None
+        license_text = AttachedText(content=text['content'], encoding=encoding) if text else None
         return License(
             spdx_license_id=param.get("id", None),
             license_name=param.get("name", None),

I however didn't take the time yet to finally understand the (old) cyclonedx-python-lib API, completely unsure whether this is the way how the Encoding stuff is supposed to work.

As said, I'm still unsure if it's worth to invest more time here or better into replacing our parser by the one from a newer cyclonedx-python-lib version. Assigning to @t-graf for decision how to proceed here.

tngraf commented 1 month ago

It is a bug in CaPyCLI: there are multiple ways to express a license in CycloneDX (1.3, 1.4, ...).
We just never came across

"licenses": [
        {
          "license": {
            "name": "Unknown",
            "text": {
              "encoding": "base64",
              "content": "Tm90ZSB0aGF0IGl0IGl....."
            }
          }
        }
      ]

In theory you add a commercial license as base64 encoded PDF file...

@gernot-h updating to the latest version cyclonedx-python-lib might not be trivial. They wrote
in the release notes that many interfaces have changed and Hakan acknowledged this.

bjoernbusch commented 1 month ago

Updated to 2.5.1, works like a charm, thanks for the quick reaction