NOTE: This project is a work-in-progress and is soliciting feedback from experts in the respective areas. Please open questions as new GitHub issues. Thank you for your feedback!
This repository uses terminology consistent with the Python Packaging User Guide Glossary. Note for non-Python package users: this terminology may be different compared to other software ecosystems, please be aware of these differences when reading and contributing.
Software Bill-of-Materials documents (SBOMs) are a technology and ecosystem-agnostic format for describing software composition, provenance, and other metadata. SBOMs are required by recent software security regulations, like the Secure Software Development Framework (SSDF) and the Cyber Resilience Act (CRA). Due to their inclusion in these regulations, the demand for SBOM documents of open source projects is expected to be high. For example, the Tennessee Valley Authority has already begun attempting to collect SBOM documents from open source projects like CPython.
The goal is to minimize the demands on open source project maintainers by enabling open source users that need SBOMs to self-serve using existing tooling. Another goal is to enable contributions to create or annotate projects with SBOM information from those same users that need SBOM documents from projects. Today there is no mechanism to propagate the results of those contributions into SBOM tooling so there is no reason to contribute this type of work.
Python packages are particularly affected by the "phantom dependency" problem, where software that isn't written in Python are included in Python packages for many reasons, such as ease of installation and compatibility with standards:
This software can't be described accurately using Python package metadata and so is likely to be missed by software composition analysis (SCA) software which can mean vulnerable software components aren't reported accurately.
Attempting to adopt every field offered by SBOM standards into Python core metadata would result in an explosion of new core metadata fields including needing to keep up-to-date as SBOM standards continue to evolve to suit new needs in that space. Instead, this proposal delegates metadata to SBOM documents and formats and adds Python package metadata for linking to SBOM documents contained within a Python package.
This standard also doesn't aim to replace Python core metadata with SBOMs, instead focusing on the SBOM information being supplemental to core metadata. Core metadata fields MUST be used as the authoritative location for information about a Python package itself and included SBOMs MUST only contain information about dependencies included in the package archive OR information about the software in the package that can't be encoded into core metadata but is relevant for the SBOM use-case (such as, "software identifier", "purpose", "support level", etc).
Today there is no method to encode information for cross-language/ecosystem software
dependencies into Python package metadata. This project proposes using SBOM formats
for this purpose and allowing SBOM documents to be included in Python packages archives
to self-describe software within those package archives. Included SBOM documents are then
referenced using a new Python metadata field Sbom-File
so they are discoverable within a Python package.
For example, a Python wheel for numpy containing an SBOM document:
numpy-2.1.3.dist-info/sboms/bundled.cdx.json
...where that SBOM file contains information about software like lapack-lite
which the numpy team bundles themselves
and libgfortran
which was "repaired" into the wheel by auditwheel
:
{
"bomFormat": "CycloneDX",
"specVersion": "1.6",
"metadata": {
// Primary component is numpy
"component": {
"type": "library",
"name": "numpy",
"version": "2.1.3"
}
},
// Sub-components described here:
"components": [
{
"name": "lapack-lite",
// ...
},
{
"name": "libgfortran",
"purl": "pkg:rpm/almalinux/libgfortran@8.5.0-22"
}
// ...
]
}
The proposal would require:
Survey Python package tools, answer the questions "Can these tools adopt this standard?" "How difficult is creating quality SBOM information for Python projects?"
Survey SBOM tools and standards, answer the question: "how useful is the information encoded by this standard?" Using popular SBOM generation tools, can SBOMs be generated to meet the following regulations?
The survey will inform whether this proposal can be adopted by these tools and how useful this standard would be to downstream consumers. Some of this survey will come in the form of a pre-PEP and PEP discussion of the subproject below.
This subproject will require the above subproject to be complete to be "ready for submission" to be reviewed and approved, but is not blocked on starting the draft PEP and discussion process.
Sbom-File
for specifying the location(s) of one or more SBOM files in a package. New package metadata version for the new field.sbom-files
added to [project]
table for conditional and unconditional inclusion of SBOM documents in Python packages.
Conditional SBOM files use markers./.dist-info/sboms/
) in Python packages and installed locations.
This directory will be similar to the /.dist-info/licenses/
directory specified in PEP 639.This subproject aims to provide high-quality calibration materials for SBOM tool developers. This subproject can be worked on concurrently to the above two subprojects and then updated later once the above PEP is accepted.
These examples can then be used by SBOM tool developers to verify their software is working for Python packages.
pyproject.toml
under project.sbom-files
.markers
will be applied to the sbom-files
entry.setuptools
) processes the sbom-files
field and adds
corresponding Sbom-File
Python package metadata for every SBOM file that is referenced and matches with markers.auditwheel
). Tools that augment an existing wheels
should generate their own SBOM document that details the shared libraries that were bundled
if that information is available. For example, cibuildwheel for manylinux commonly uses AlmaLinux's packaging system..dist-info/sboms/...
directory in the environment.This repository is placed in the public domain or under the CC0-1.0-Universal license, whichever is more permissive.