pyca / cryptography

cryptography is a package designed to expose cryptographic primitives and recipes to Python developers.
https://cryptography.io
Other
6.7k stars 1.54k forks source link

Add a SBOM template in CycloneDX format #12024

Open hughsie opened 1 week ago

hughsie commented 1 week ago

Hi,

My name is Richard Hughes and I'm a developer at Red Hat. I'm the maintainer of fwupd and LVFS, and am trying to improve software supply chain security by encouraging OEMs, ODMs and IBVs to ship Software Bill of Materials with each firmware binary blob (SBOMs).

I'm working alongside lots of other companies proactively trying to do the right thing. The reason I've opened this pull request is because your project is either used in the build process of a firmware we care about (e.g. EDK II, or coreboot) or is built into the firmware binary itself. Although my personal focus is on firmware, the SBOM file is in CycloneDX format (one of the most popular industry standards) which makes it also useful when building containers or OS images too.

I would like to contribute this template SBOM file into your project that gets included into source control with substituted values that get populated automatically. I'm not super familiar with Python Cryptography, and so I've done my best populating the project values -- but please point out any that are incorrect and I'll fix them up. I've also put the sbom.cdx.json file in what I feel is the right place, but please say if you want me to put it somewhere different or name it a different thing; the directory and sbom prefix are unimportant.

The various firmware build tools will take these incomplete SBOM files and then build them into a complete composite SBOM to represent the firmware. Having an upstream reference to what the PURL and CPE values should be means we have something we can trust; I could quite easily spin up a web-service that we say "what CPE do we use for X" -> "cpe:2.3:a:Y:Z::::::::` but we don't actually know if that's still true, up to date, or what the maintainer actually wants them to be. Putting the template upstream means we can trust the values we find in the checked out code during the build process.

Also, if you’re not happy with cryptography.io being labelled a supplier (which seems more appropriate from a SBOM point of view, but makes some open source maintainers uncomfortable) we can remove that bit.

I've written a bit more about this proposal here https://blogs.gnome.org/hughsie/2024/11/14/firmware-sboms-for-open-source-projects/ and there's also lot more information about firmware SBOMs here: https://lvfs.readthedocs.io/en/latest/sbom.html – many thanks for your time and all the work that you do.

alex commented 1 week ago

I don't think it makes much sense to include a stub SBOM like this. Principally because it's fundamentally wrong: our release artifacts include other packages that properly need to be reflected into an SBOM.

Our general approach here is that we'd like there to be an official Python standard for including SBOMs in release artifacts, and then we can take advantage of that programatically. (It's important to us that we reflect things like different artifacts for different platforms having different code.)

See https://github.com/pyca/cryptography/issues/11950 and https://github.com/pyca/cryptography/issues/9811 for previous discussion.

hughsie commented 1 week ago

Our release artifacts include other packages that properly need to be reflected into an SBOM.

Right, I don't disagree -- but I noticed you've got a nice Cargo.lock we could use to construct those automatically.

It's important to us that we reflect things like different artifacts for different platforms having different code.

I agree that's a hard problem; but I'm trying to get there using small steps. CycloneDX seems to be the "winning" SBOM format at the moment, and I'd be very surprised if any future official Python standard didn't generate files in that format. When that happens we can of course add any extra data to the pretty-bare file I've suggested here, or rename it in source control, or whatever the Python proposal is.

My general point is that now when we compile a firmware we know we use python-cryptography (and we can get the commit tag and hash from git), but we don't know what the upstream team want to be referred as (e.g. supplier/author) and we don't have any information about what the CPE or PURL should be. Especially without the CPE we can't write any kind of automated VEX rules that scan software for security issues. With a trusted CPE I can do one query that identifies firmware that is compiled with a cryptography version that's affected by a security issue and add all the affected vendors to a VINCE ticket.

i.e. I know what I'm suggesting here isn't all-encompassing, but I think it's a step in the right direction that allows us to measurably increase the wider security ecosystem.

alex commented 1 week ago

One can't simply use the Cargo.lock, because it includes various dependencies that are only used on certain platforms, you need the build machinery to output what it actually used.

Stepping back however, I think it'd be helpful if you explained in a bit more detail how you expect consumers would use this SBOM template, as is.

hughsie commented 1 week ago

I think it'd be helpful if you explained in a bit more detail how you expect consumers would use this SBOM template, as is.

Does https://docs.google.com/presentation/d/1OPBHYZAr9SWDrmXpistJrVqd8wdN94oOfDfFXoEjTxg/edit?usp=sharing help? This is what I'm working on with the UEFI group.

alex commented 1 week ago

Not particularly. The thing I am trying to understand is if we check this into our repo, mechanically, how is anyone expected to consume it?

On Fri, Nov 22, 2024 at 7:46 AM Richard Hughes @.***> wrote:

I think it'd be helpful if you explained in a bit more detail how you expect consumers would use this SBOM template, as is.

Does https://docs.google.com/presentation/d/1OPBHYZAr9SWDrmXpistJrVqd8wdN94oOfDfFXoEjTxg/edit?usp=sharing help? This is what I'm working on with the UEFI group.

— Reply to this email directly, view it on GitHub https://github.com/pyca/cryptography/pull/12024#issuecomment-2493680705, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAAGBBMRFB7DGAZLV56T7D2B4RQ7AVCNFSM6AAAAABSJFKXQSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIOJTGY4DANZQGU . You are receiving this because you commented.Message ID: @.***>

-- All that is necessary for evil to succeed is for good people to do nothing.

hughsie commented 1 week ago

The thing I am trying to understand is if we check this into our repo, mechanically, how is anyone expected to consume it

Right, sorry -- I might have not explained that terribly well. The essential build process for a firmware is to git clone dozens of different components in a top level source directory, each with either a tag or a specific commit. Once you've got all the source code you can build the firmware from the bottom up to the top component (which is usually coreboot or an EDK II derivative).

To generate the SBOM file we search the entire tree of checked out software for .cdx.json files (and other things, but unimportant for the discussion here) and if we find one we chdir() to that subproject and use git to find out things like the current commit and the last tagged version. Then we carry on with the next component, until we've not found anything else in the tree.

Once that's complete we depsolve all the components and add links, e.g.

{
  "ref": "pkg:github/openssl/openssl@3.0.9",
  "dependsOn": "pkg:github/krb5/krb5@1.21.3"
},

...and then we save the composite file as a CycloneDX file with all the @@ tokens fleshed out with the real values. e.g. pointing the uswid tool at my edk2 checkout I get a ~800 line SBOM describing all the other components that have already got sbom.cdx.json files. If it's helpful to visualize, there's also https://docs.google.com/spreadsheets/d/1gKEWLxdLubOfgS1cqqY2umEX0ohYEoEy-B1i59HNVqg/edit?usp=sharing which shows progress so far.

alex commented 1 week ago

That helps somewhat, but I think makes clear to me that the needs of your process are actually pretty distinct from the needs of most of our Python consumers:

Most people do not checkout from git, they install from PyPI. They don't need to be told what our purl is, because our purl is automatically computable from the thing they just installed (i.e., it's pkg:pypi/cryptography, which they know because they just pip install cryptography). I suspect they don't care about any of the licensing, URL, or author metadata because those are already available in standard Python packaging metadata.

Finally, if I'm reading this correctly, the various @ substitutions in here are things that are proprietary to your build process, they're not part of the Cyclcone spec. Is that right?

hughsie commented 1 week ago

the needs of your process are actually pretty distinct from the needs of most of our Python consumers:

I think that's fair. I wouldn't entirely mind if the file was called .sbom.cdx.json so that it's hidden by default.

Most people do not checkout from git, they install from PyPI.

It's also why I didn't include the file in the pyproject.toml -- it's just not needed if you're not checking the code out.

I suspect they don't care about any of the licensing

Probably true as well. Also bear in mind, what I'm trying to do is not give anyone an excuse to not use open source software. The requirement for SBOMs is coming from lots of people paying a lot of money for [mostly enterprise] software (although I suspect very little actually filters down to you and me...) and if there's no SBOM -- it can't be used in the end product. This means either the thing that wants to do crypto makes up a SBOM entry for each dep that may or may not be what upstream agrees with (or actually accurate), or the the thing that wants to do crypto uses something proprietary that does have all the CISA-mandated metadata tags.

That's perhaps an unlikely scenario for already shipping devices, but a scenario I can certainly imagine for new ones -- and I think doing nothing is probably an ill-advised thing to do for a security sensitive crypto provider. I personally think it more than outweighs the maintenance cost of a 52 line JSON file.

if I'm reading this correctly, the various @ substitutions in here are things that are proprietary to your build process, they're not part of the Cyclcone spec

Yup, but I'm doing it with the support of the UEFI SBOM Working Group. Longer term those tags will do into a UEFI best practices document and probably into the Embedded SBOM metadata specification too.

alex commented 1 week ago

I am, for my sins, acutely aware of the genesis of SBOM requirements. (I work for the federal government.)

While I appreciate the motivation behind this PR, to me it is not aligned with what we think the correct goal for an SBOM is. I think it'd be much better to put energy towards @sethmlarson's work to have a Python standard for how SBOM's are handled.

hughsie commented 1 week ago

I am, for my sins, acutely aware of the genesis of SBOM requirements. (I work for the federal government.)

:)

While I appreciate the motivation behind this PR, to me it is not aligned with what we think the correct goal for an SBOM is. I think it'd be much better to put energy towards @sethmlarson's work to have a Python standard for how SBOM's are handled.

I guess that's your choice, I'm not going to force anyone to do anything they don't really want to do. Although I think a generic pythonic way to construct a SBOM is a noble aim (and also useful for me long term) it's not something I personally can work on -- cryptography is the only Python dep on the list of dependant components we need -- and I can't really expand the scope of my SBOM work any more unfortunately. Being super honest, I'm already on the very edge of what's useful to Red Hat and I don't want to push my luck any more.

sethmlarson commented 3 days ago

Thanks for the tag-in @alex!

@hughsie The referenced project isn't for a "Pythonic" way to create an SBOM, but rather specifying how SBOM documents will be included in a Python package for downstream use. So essentially, authoring an SBOM document by hand as you've done will be a valid approach. However, having a nice way to keep these documents up-to-date will also be necessary for many projects. Working on it! :)