psf / sboms-for-python-packages

Software Bill-of-Materials documents for Python packages
24 stars 1 forks source link

Software Bill-of-Materials for Python packages

NOTE: This project is a work-in-progress and is soliciting feedback from experts in the respective areas. Please open questions as new GitHub issues. Thank you for your feedback!

Terminology

This repository uses terminology consistent with the Python Packaging User Guide Glossary. Note for non-Python package users: this terminology may be different compared to other software ecosystems, please be aware of these differences when reading and contributing.

Motivation

Regulations

Software Bill-of-Materials documents (SBOMs) are a technology and ecosystem-agnostic format for describing software composition, provenance, and other metadata. SBOMs are required by recent software security regulations, like the Secure Software Development Framework (SSDF) and the Cyber Resilience Act (CRA). Due to their inclusion in these regulations, the demand for SBOM documents of open source projects is expected to be high. For example, the Tennessee Valley Authority has already begun attempting to collect SBOM documents from open source projects like CPython.

The goal is to minimize the demands on open source project maintainers by enabling open source users that need SBOMs to self-serve using existing tooling. Another goal is to enable contributions to create or annotate projects with SBOM information from those same users that need SBOM documents from projects. Today there is no mechanism to propagate the results of those contributions into SBOM tooling so there is no reason to contribute this type of work.

Phantom dependencies

Python packages are particularly affected by the "phantom dependency" problem, where software that isn't written in Python are included in Python packages for many reasons, such as ease of installation and compatibility with standards:

This software can't be described accurately using Python package metadata and so is likely to be missed by software composition analysis (SCA) software which can mean vulnerable software components aren't reported accurately.

Rationale

Attempting to adopt every field offered by SBOM standards into Python core metadata would result in an explosion of new core metadata fields including needing to keep up-to-date as SBOM standards continue to evolve to suit new needs in that space. Instead, this proposal delegates metadata to SBOM documents and formats and adds Python package metadata for linking to SBOM documents contained within a Python package.

This standard also doesn't aim to replace Python core metadata with SBOMs, instead focusing on the SBOM information being supplemental to core metadata. Core metadata fields MUST be used as the authoritative location for information about a Python package itself and included SBOMs MUST only contain information about dependencies included in the package archive OR information about the software in the package that can't be encoded into core metadata but is relevant for the SBOM use-case (such as, "software identifier", "purpose", "support level", etc).

Proposal

Today there is no method to encode information for cross-language/ecosystem software dependencies into Python package metadata. This project proposes using SBOM formats for this purpose and allowing SBOM documents to be included in Python packages archives to self-describe software within those package archives. Included SBOM documents are then referenced using a new Python metadata field Sbom-File so they are discoverable within a Python package.

For example, a Python wheel for numpy containing an SBOM document:

numpy-2.1.3.dist-info/sboms/bundled.cdx.json

...where that SBOM file contains information about software like lapack-lite which the numpy team bundles themselves and libgfortran which was "repaired" into the wheel by auditwheel:

{
  "bomFormat": "CycloneDX",
  "specVersion": "1.6",
  "metadata": {
    // Primary component is numpy
    "component": {
      "type": "library",
      "name": "numpy",
      "version": "2.1.3"
    }
  },
  // Sub-components described here:
  "components": [
    {
      "name": "lapack-lite",
      // ...
    },
    {
      "name": "libgfortran",
      "purl": "pkg:rpm/almalinux/libgfortran@8.5.0-22"
    }
    // ...
  ]
}

The proposal would require:

Survey of Python packages, Python package tools, and SBOM tooling

Survey Python package tools, answer the questions "Can these tools adopt this standard?" "How difficult is creating quality SBOM information for Python projects?"

Survey SBOM tools and standards, answer the question: "how useful is the information encoded by this standard?" Using popular SBOM generation tools, can SBOMs be generated to meet the following regulations?

The survey will inform whether this proposal can be adopted by these tools and how useful this standard would be to downstream consumers. Some of this survey will come in the form of a pre-PEP and PEP discussion of the subproject below.

New standards (PEP) for encoding SBOM information

This subproject will require the above subproject to be complete to be "ready for submission" to be reviewed and approved, but is not blocked on starting the draft PEP and discussion process.

Golden examples for SBOM tool developers

This subproject aims to provide high-quality calibration materials for SBOM tool developers. This subproject can be worked on concurrently to the above two subprojects and then updated later once the above PEP is accepted.

These examples can then be used by SBOM tool developers to verify their software is working for Python packages.

How does it all fit together?

License

This repository is placed in the public domain or under the CC0-1.0-Universal license, whichever is more permissive.