psf / sboms-for-python-packages

Software Bill-of-Materials documents for Python packages
24 stars 1 forks source link

Should SBOMs be generated per-Python version? per-platform? #14

Open anthonyharrison opened 5 days ago

anthonyharrison commented 5 days ago

You will need to create a SBOM for each version of Python which the package supports as the dependencies will vary depending on the rlease of Python. There are also differences between different supported environments, so Windows and Linux SBOMs are probably going to be different.

That means for most packages (and I would recommend that SBOMs are generated in both SPDX and CycloneDX formats), there will be probably at least 20 SBOMs per release (one for each release of python 3.9,3.10,3.11,3.12,3.13 2(windows, linux) 2 (SPDX, CycloneDX).

Note each SBOM will just be an example. Unless ALL dependencies (direct and transitive) have pinned versions, there will always be variations

sethmlarson commented 5 days ago

Thanks for opening this issue @anthonyharrison and for your work for Python and SBOMs already. I answered this over email too, but for others to be able to participate:

Regarding your concerns about there needing to be "many SBOMs" for a Python package: We should avoid at all costs trying to "model" the world of possibilities, instead sticking to the concrete things that we know to be true and requiring those downstream to handle the work after the range of possibilities has collapsed (such as by installation, but other methods like "locking" work here too).

If there are optional dependencies based on Python version for a Python package, that is fine: the Python package+build system can model the software contained within the Python package and it's the responsibility of another SBOM generation tool to model the software eventually installed on the machine after the full resolution has occurred. Exploding the full range of possibilities is more work for no less certainty.

I also don't think it's Python package's responsibility to figure out who wins between CycloneDX and SPDX, supporting one SBOM standard in a package is already going above and beyond what should be expected of an open source maintainer. It'll be up for tools and standards bodies to decide what they do with the information or move forward towards one standard.

I think this approach focusing on concreteness means less work in places where there is less ability to do more work (ie open source project maintainers) and puts more of that responsibility on for example, maintainers of SBOM tools and OSS users. Hopefully this shows how I am thinking about the SBOM problem for individual Python packages?

jkowalleck commented 4 days ago

I've read the original request, and I do not understand how this idea came together, @anthonyharrison. I would like to learn what your thoughts behind all of this are. You are in the industry for decades, so there must be something to it, that I don't see. Could you elaborate your thoughts?


It appears to me, that you are describing a planning-SBOM, which is a copy of the information given by package manifests and package resolution systems -- which describe options like version ranges, environmental constraints, etc, and they assume to know the effective package registry. (which ingredients could be added to a hamburger under which constraints, and options where they could be sourced from, as well as additional optional ingredients).

If I understand you right, you are talking about distributing planning-SBOMs?

I don't see much value in this, since a production line might not follow the ideas of package authors or package registries. I'll give you an example: I run a PyPI proxy in my org, and all python packages are fetched from it. Further, I have some packages in a patched version available on my local proxy. There is no way a package author/registry could know about that when THEY built an SBOM for THEIR package. But I know about it, when I build the SBOM for my python application. (I am the chef, I know what is in the hamburger)

So before discussing what/how, I would like to understand the value of distributed planning-SBOM.


The previous part is about dep resolution and runtime-SBOM versus planning-BOM. On the other hand, for bundled dependencies - which are baked into the python package and shipped with it, some call them phantom-dependencies - it makes complete sense to have the actual shipped things described somehow, maybe in SBOM.

anthonyharrison commented 4 days ago

@jkowalleck I regularly generate SBOMs for deployed Python packages (could probably align these with either the Build or Deployed SBOM types). I have noted that installing (in a clean python virtual environment), that I get different dependencies depending on the version of Python used. Some of this is due to constraints in the requirements.txt file such as python_version <=3.9, some of this is due to changes in the standard Python library.

Neither of the SBOM standards (SPDX or CycloneDX) require that the SBOM defines the version of Python for which the SBOM refers to. But as different dependencies can be included based on the version of the Python environment, I concluded that it was necessary to therefore create a separate SBOM for each version of Python.

I presented my findings at FOSDEM in 2024

jkowalleck commented 4 days ago

re: https://github.com/psf/sboms-for-python-packages/issues/14#issuecomment-2490725901

@anthonyharrison , So you are saying that everybody installing a certain set of python packages might end up with a different OBOM? Well, that is the whole point of an OBOM: it is unique to the environment.

Neither of the SBOM standards (SPDX or CycloneDX) require that the SBOM defines the version of Python for which the SBOM refers to.

The standards are document standards - they describe what and how to communicate certain observations and facts. SBOM/OBOM/ML-BOM - they are profiles on the standard. An OBOM would include the operating system, the python version, and so on, and all/only the dependencies that are actually there. An SBOM might not include all of this, it would simply include the dependencies.

Would you agree?

PS: the python that runs the package/code is a runtime-dependency, and it is only known at runtime. The fact that the installation happened with a certain version of python, or that a certain python is present at a system, does have any value as evidence. Therefore, if the python version was relevant for any BOM, than only for OBOM.

PPS: I do not see any value in distributed OBOM for python-packages. Distributing OBOMs with packages - what would be the value of that?!

PPPS: CycloneDX is working on an improvement for documenting interpreters and runtimes: https://github.com/CycloneDX/specification/issues/233 Please help improve the standards, work with us :-D