psf / sboms-for-python-packages

Software Bill-of-Materials documents for Python packages
24 stars 1 forks source link

Existing libraries for Python SBOM generation? #7

Open ncoghlan opened 2 weeks ago

ncoghlan commented 2 weeks ago

Would it make sense to survey and recommend libraries for generating SBOM metadata for Python packages as part of this project?

Full disclosure: I'll actually need to add SBOM support to my current work project at some point (see https://github.com/lmstudio-ai/venvstacks/issues/67), so I have a concrete interest in knowing which libraries actually do a decent job of taking a set of Python dependency declarations (and/or installed environments) and turning them into the corresponding SBOM.

sethmlarson commented 2 weeks ago

Thanks for opening this issue! This is a good idea, I can definitely put together some recommendations.

Secrus commented 2 weeks ago

Requested by @sethmlarson. For SBOMs to be easily adapted into the current workflows, the library/libraries required for those should be lightweight and possibly without dependencies. My recent clash with supply-chain/signing libraries was when I investigated the possibility of integrating PEP 740 attestations generation into Poetry. Adding only pypi-attestations library puts together the following dependency tree:

pypi-attestations 0.0.13 
├── cryptography *
│   └── cffi >=1.12
│       └── pycparser *
├── packaging *
├── pyasn1 >=0.6,<1.0
├── pydantic *
│   ├── annotated-types >=0.6.0
│   ├── email-validator >=2.0.0
│   │   ├── dnspython >=2.0.0
│   │   └── idna >=2.0.0
│   ├── pydantic-core 2.23.4
│   │   └── typing-extensions >=4.6.0,<4.7.0 || >4.7.0
│   └── typing-extensions >=4.6.1
├── sigstore >=3.4,<4.0
│   ├── cryptography >=42
│   │   └── cffi >=1.12
│   │       └── pycparser *
│   ├── id >=1.1.0
│   │   ├── pydantic *
│   │   │   ├── annotated-types >=0.6.0
│   │   │   ├── email-validator >=2.0.0
│   │   │   │   ├── dnspython >=2.0.0
│   │   │   │   └── idna >=2.0.0
│   │   │   ├── pydantic-core 2.23.4
│   │   │   │   └── typing-extensions >=4.6.0,<4.7.0 || >4.7.0
│   │   │   └── typing-extensions >=4.12.2
│   │   └── requests *
│   │       ├── certifi >=2017.4.17
│   │       ├── charset-normalizer >=2,<4
│   │       ├── idna >=2.5,<4 
│   │       └── urllib3 >=1.21.1,<3
│   ├── importlib-resources >=5.7,<6.0
│   ├── platformdirs >=4.2,<5.0
│   ├── pyasn1 >=0.6,<1.0
│   ├── pydantic >=2,<3
│   ├── pyjwt >=2.1
│   ├── pyopenssl >=23.0.0
│   │   └── cryptography >=41.0.5,<44
│   ├── requests * 
│   ├── rfc8785 >=0.1.2,<0.2.0
│   ├── rich >=13.0,<14.0
│   │   ├── markdown-it-py >=2.2.0
│   │   │   └── mdurl >=0.1,<1.0
│   │   ├── pygments >=2.13.0,<3.0.0
│   │   └── typing-extensions >=4.0.0,<5.0
│   ├── sigstore-protobuf-specs 0.3.2
│   │   └── betterproto 2.0.0b6
│   │       ├── grpclib >=0.4.1,<0.5.0
│   │       │   ├── h2 >=3.1.0,<5
│   │       │   │   ├── hpack >=4.0,<5
│   │       │   │   └── hyperframe >=6.0,<7
│   │       │   └── multidict *
│   │       │       └── typing-extensions >=4.1.0
│   │       └── python-dateutil >=2.8,<3.0
│   │           └── six >=1.5
│   ├── sigstore-rekor-types 0.0.13
│   │   └── pydantic >=2,<3
│   └── tuf >=5.0,<6.0
│       ├── requests >=2.19.1 
│       └── securesystemslib >=1.0,<2.0
└── sigstore-protobuf-specs *
    └── betterproto 2.0.0b6
        ├── grpclib >=0.4.1,<0.5.0
        │   ├── h2 >=3.1.0,<5
        │   │   ├── hpack >=4.0,<5
        │   │   └── hyperframe >=6.0,<7
        │   └── multidict *
        │       └── typing-extensions >=4.1.0
        └── python-dateutil >=2.8,<3.0
            └── six >=1.5

While some of the dependencies are already in the dependency tree, that is just way too much to include in the project. Some package builders/managers could handle that by using plugins, but that are extra steps instead of a simple built-in solution that we could have.

ncoghlan commented 1 week ago

@sethmlarson I came across https://github.com/sethmlarson/pip-sbom by way of the pip plugin/extension issue. What's are the primary points of concern leading to the "experimental" marker? Just the phantom dependency issue that means it misses a lot of things that aren't declared in the current distribution package metadata? Or do you have additional concerns beyond that limitation?

I was mostly poking at it to get an idea of what an SBOM dependency tree might look like (at least with current libraries):

$ poetry show --tree
pip-sbom 0.0.1a2 pip-sbom
├── cyclonedx-python-lib *
│   ├── license-expression >=30,<31
│   │   └── boolean-py >=4.0
│   ├── packageurl-python >=0.11,<2
│   ├── py-serializable >=1.1.1,<2.0.0
│   │   └── defusedxml >=0.7.1,<0.8.0
│   └── sortedcontainers >=2.4.0,<3.0.0
├── packageurl-python *
├── packaging *
├── pip *
└── spdx-tools >=0.8
    ├── beartype *
    ├── click *
    │   └── colorama *
    ├── license-expression *
    │   └── boolean-py >=4.0
    ├── ply *
    ├── pyyaml *
    ├── rdflib *
    │   └── pyparsing >=2.1.0,<4
    ├── semantic-version *
    ├── uritools *
    └── xmltodict *

(I also checked the anchore-syft package you used in your latest blog post, but that's bundling an external binary command rather than being a regular Python package)

sethmlarson commented 1 week ago

@ncoghlan

What's are the primary points of concern leading to the "experimental" marker?

Because SBOMs are used for regulatory things, I didn't want people to start using this tool that I've invested relatively small amounts of time in. I created this project mostly to test what is possible today for Python packages and as a place to implement draft packaging PEPs ahead of their acceptance to show how useful they'd be for the SBOM use-case (such as PEP 710 and now my upcoming PEPs for SBOMs).

And yeah... I am not particularly happy with the state of affairs for SBOM libraries. At the end of the day, it's a data format. I think creating a tiny module for specifically generating SBOM documents will make sense so that packaging tools can adopt it very easily? That use-case is constrained to a very narrow set of SBOM documents and features, typically.