How will PyPI filtering of SBOM data work?

psf / sboms-for-python-packages

Software Bill-of-Materials documents for Python packages

22 stars 1 forks source link

How will PyPI filtering of SBOM data work? #1

Open miketheman opened 2 weeks ago

miketheman commented 2 weeks ago

Set of filtering requirements to add to popular package indexes like PyPI to ensure other tools are adhering to standards.

Today this is basically accomplished with trove classifiers, which are self-advertised. For example, projects that say they are typed: https://pypi.org/search/?q=&o=&c=Typing+%3A%3A+Typed This can lead to a disconnect from when the functionality is added to when the classifier is added - a la https://github.com/glyph/automat/pull/161

Do you envision a different mechanism that is more verifiable, i.e. from the metadata itself?

If so, there's existing challenges with metadata storage that likely need to be addressed first (and we should fix that!) but it's not super simple.

sethmlarson commented 2 weeks ago

I was imagining less intense filtering for "absolute correctness" and more:

Are the SBOM documents valid JSON?
This document is claiming to be SPDX/CycloneDX, does it actually have the bare minimum to be recognized by other tools? (Usually format version and a few required fields)
Is any basic information in the SBOM primary component not matching the Python dist metadata or not have a primary component?

This will help filter out when tools are generating SBOM data incorrectly and prevent silent "no linkage" scenarios. There might be more conditionsto add as we discover more!

miketheman commented 1 week ago

Those kinds of filters make sense - do you envision them being as part of the upload phase, or as a post-upload verification, or a user-side search?

sethmlarson commented 1 week ago

@miketheman I was imagining having them as a part of the upload phase if that's possible. I don't know how "post-upload" verification surfaces to the user, is that documented somewhere?

miketheman commented 1 week ago

I don't know how "post-upload" verification surfaces to the user, is that documented somewhere?

Largely because it doesn't exist yet! 😆 But since we kick off tasks in response to a completed upload, we could perform analyses post-upload and persist results, and then surface them to the package managers or the world.

The upload step is a bit "heavy" now, so I'd suggest that this step include some refactoring to make it flow a little better.

And if any of these checks could be performed prior to upload (a la twine check) and save the user from an attempt/failure/fix/attempt/... cycle, that'd be good UX in my mind.

sethmlarson commented 1 week ago

Got it. twine check is a great idea, I'll look at that too.