Embed CPE names into binaries

knqyf263 commented 3 years ago

I'm a developer of a vulnerability scanner for container images, which depends on package managers provided by distributions such as rpm, dpkg, and apk. But most of the official images install their primary software by self-compilation. In the case of redis, redis is installed by make.

https://github.com/docker-library/redis/blob/1779e83980f7cc0e197c649ba560306991e2e4c6/5.0/Dockerfile#L79-L80

It means a scanner isn't able to get the package name/version from package managers. We can use CPE names provided by NIST for vulnerability detection, but it's hard to match up CPE names from binaries. The CPE name of Redis 6.0 is cpe:2.3:a:redislabs:redis:6.0:-:*:*:*:*:*:*. https://nvd.nist.gov/products/cpe/detail/757362?keyword=cpe:2.3:a:redislabs:redis:6.0:-:*:*:*:*:*:*&status=FINAL,DEPRECATED&orderBy=CPEURI&namingFormat=2.3

It is not easy for scanners to know the vendor like redislabs. Also, there are some other redis-related CPE names. Simple string matching from binary names may result in choosing the wrong CPE name. https://nvd.nist.gov/products/cpe/search/results?namingFormat=2.3&keyword=redis

So, I'd like to suggest embedding a CPE name into a binary. I don't care about how to achieve it, but let me give you an example, please.

$ cat main.go
package main

func main(){}
$ go build -o _main main.go
$ cat cpe.txt
cpe:2.3:a:knqyf263:main:0.0.1:-:*:*:*:*:*:*
$ objcopy --add-section cpe=cpe.txt _main main
$ objcopy main /dev/null --dump-section cpe=/dev/stdout
cpe:2.3:a:knqyf263:main:0.0.1:-:*:*:*:*:*:*

Ideally, each software should do it in its own Makefile, but if that's difficult, it would be helpful to just do it in the Dockerfile.

If it's still hard, we can just embed it in Docker Labels. https://docs.docker.com/engine/reference/builder/#label

In any case, it would improve the accuracy of many scanners if we could provide a way to know the CPE names of the self-compiled binaries and a lot of software (or container images) follows the standard.

Thanks.

MarcinHoppe commented 3 years ago

@knqyf263 many thanks for posting this! I think I can see a link to vulnerability disclosures, but at a glance it seems like the root problem here is generating SBoM for software that is being built and not installed from a package repository.

@stevespringett do you know if there is any prior art in this space?

stevespringett commented 3 years ago

I'm not aware of prior art for embedding CPEs into binaries. The closest I can think of is SWID tags are commonly installed on the filesystem for many commercial software products. Not binary, but it's commonly used for ITAM, CMDB, and discovery services.

I think SBOMs are more appropriate in this particular case however. Both CycloneDX and SPDX support CPEs and PURLs for the components they describe. CycloneDX also supports SWID.

jeremylong commented 3 years ago

I would agree with Steve - in cases like this generating an SBOM using CycloneDX or SPDX is a better option. However, in most cases the software contained in an SBOM would not have vulnerabilities and as such no CPE. If I'm packaging software in a container that has never had a reported vulnerability and as such does not have a CPE what do we do? Yes, I could go register one in anticipation that a vulnerability is found in my software (but that seems like a lot of work for little gain as a software publisher) or I could generate a CPE and hope it is used if/when a vulnerability is published, or more likely my SBOM would just include the package URL and leave it up to the vulnerability scanners of the world to map the package URL to the appropriate CPE.

If I were driving this effort my first stop would be to talk to the folks at Docker and try to get a specific SBOM format into the container image format.

MarcinHoppe commented 3 years ago

Thanks for the comment @jeremylong, much appreciated!

@knqyf263 do you think it would be useful to narrow down the problem to containers, or should we look at a more general solution?

jeremylong commented 3 years ago

I also wouldn't just stop with Docker. The community could start putting in issues and PRs to the build tools like maven, gradle, build plugins that combine artifacts like uber/shade jar plugins to start generating SBOMs and embedding them as part of the standard output - and embedding it in the binary when possible.

knqyf263 commented 3 years ago

If I'm packaging software in a container that has never had a reported vulnerability and as such does not have a CPE what do we do?

Nice catch! As you said, software may not have any vulnerabilities and CPE. We have to consider how to map the package URL to the appropriate CPE.

@knqyf263 do you think it would be useful to narrow down the problem to containers, or should we look at a more general solution?

It sounds great! Ideally, I'd like to embed IDs into any artifacts such as jar files and binaries as @jeremylong said, but if we cover containers as a first step, it would be much helpful for cloud-native area.

david-a-wheeler commented 3 years ago

This kind of thing has been proposed before. Embedding CPE names was previously proposed for OWASP Dependency-check by Dale Visser, including a specification and pull request. It wasn’t accepted at that time, for reasons I don’t agree with. It might be a wise time to discuss again & re-introduce this proposal. The proposal also proposed a specific mechanism that could be reused elsewhere.

See: https://github.com/jeremylong/DependencyCheck/pull/298

There are other ways to identify packages, other than CPE. While we're adding CPE support, let's add support for other mechanisms:

package URLs (purls) - https://github.com/package-url/purl-spec
Homepage URLs
(Source code) Repository URLs

The CII Best Practices badge uses the homepage URL + repo URL to identify projects, and that works pretty well.

I doubt it’d be possible to embed SWID, because that’s a hash, and you can’t embed the hash you still need to calculate. But being able to support multiple formats (CPE, purl, homepage URL, and repo URL) would still be an improvement. 

knqyf263 commented 3 years ago

Yes, supporting multiple formats would be great! However, as far as I know, NVD provides only CPE names as of today, so we might not be able to detect vulnerabilities from purls. Is there any way to map purl to CPE?

stevespringett commented 3 years ago

@david-a-wheeler has a good point. Let's not fixate on CPE.

CPE is deprecated, has a known chicken/egg problem - and we're still waiting for guidance from NVD regarding migration and roadmap to SWID tags.

CPE is one way to identify software. Package URL is another way, as are SWID tagIds. Depending on the type of software being identified, you may end up wanting a different format to identify it. The CycloneDX spec has guidelines that recommend the use of CPE, PURL, and SWID based on the type of software being identified. See https://cyclonedx.org/use-cases/#known-vulnerabilities

It's also important to note that CPE, PURL, and SWID all have varying levels of support for vulnerability use cases across multiple sources of vulnerability intelligence. The NVD currently only supports CPE with a plan to support SWID. Other sources of intelligence support other identify formats. Both CPE and PURL can be used today to identify vulnerabilities.

OSS Index (Sonatype) supports vulnerability use cases using PURL. A request to open source their PURL to CPE mappings was made about a year ago. Refer to https://github.com/OSSIndex/vulns/issues/53.

There's also https://github.com/nexB/vulnerablecode which looks really promising. And I know of other companies (SCA and general purpose vulnerability databases) that are already in the process of implementing support for PURL.

I view this simply as a need to be able to identify software in one of the three formats that are most applicable to the software being identified. Using containers as a starting point, I believe, is already a (mostly) solved problem. Anchore has Syft and Grype that do a really good job. There are others as well.

JasonKeirstead commented 3 years ago

RE "supporting multiple formats", One thing to keep in mind here is that at the end of the day someone has to consume this information and actually do something real with it in code that adds value.

The more formats supported, the more exponentially complex the complete end-to-end vulnerability lifecycle (from Reporter -> Open Source Project -> Upstream -> Commercial Vendor -> Software Consumer) becomes.

I am not saying that is a showstopper for multiple formats - but this needs to be a strong consideration. It is relatively trivial to just add formats to a JSON or XML file, but this only helps the problem set if they all end up being able to be operationalized on those 3 levels downstream.

kerberosmansour commented 2 years ago

@stevespringett @david-a-wheeler was there ever a consensus on a way forward for this?

I do feel there it is going in a general direction is that correct?

Is it something like: As part of the build create an SBOM (e.g. CycloneDX)/Unique identifier (e.g. PURL/CPE/SWID) then embed the information in the binary in a standardized way and have scanning tools pull the information from those binaries based on the approach?

stevespringett commented 2 years ago

@kerberosmansour Embedding an SBOM into a binary is likely a non-starter. Some SBOMs are going to be very large, especially ones with full license text.

Embedding component identity into binaries (cpe, purl, swid, etc) would require collaboration across the dozens of binary formats. For example, I don't think ELF is future-proof and would require a major revision in the ELF format in order to make enough room in the file or program header to store identity information. Currently, I don't think ELF is capable of this. Perhaps someone more familiar with ELF can chime in.

I think most SWID tagIds are 16B GUIDs. Purls can/will be much larger than that. In practice, support for URIs up to 1KB in length would likely cover the majority of purls use cases.

Foxboron commented 2 years ago

systemd is attempting to standardize CPE data along with a few other things into binaries.

https://systemd.io/COREDUMP_PACKAGE_METADATA/

https://github.com/systemd/package-notes

jbmaillet commented 2 years ago

For the record:

This has been envisioned for Debian packages in 2012: see https://wiki.debian.org/CPEtagPackagesDep, last status in 2016. I don't know "why" it hasn't been done (I am not a Debian maintainer nor security team member), but there must have been some valuable reasons.

What I knew did existed in 2016 for Debian, and that I cannot find back, was a super simple lexicon list with on each line 1/ the name of the Debian package (says libfoo) and 2/ the corresponding CPE (say "a:fooproject:libfoo"). Job done, except for the fact that CPE change over time / there are often several alive at the same time so that's not a simle 1-to-1 relation. Ex on the kernel right know o:linux:linux_kernel, but also a:kernel:selinux, a:linux:mac80211etc.

As for the "CPE is deprecated": SWID were at the ISO in 2015, and were supposed to land in SCAP v2.0, ETA 2020. I was standing on the verge of my seat, but we are already 2 years late, I've stop holding my breath. Last status in SCAPv2.0 FAQ "what is the timeline..." mention "longer-term work continues thru 2019". This software inventory problem has not been solved in 20 years, my best hopes are from the EO 14028 momentum.

Foxboron commented 2 years ago

Embedding an SBOM into a binary is likely a non-starter. Some SBOMs are going to be very large, especially ones with full license text.

I think utilizing SPDX license names is good enough?

I watched Richard Hughes recent talk on fwupdmgr and they have been starting to how they could add SBOMs into UEFI firmware update capsules. This is currently only a POC though.

https://github.com/hughsie/python-uswid

ossf / wg-vulnerability-disclosures

Embed CPE names into binaries #76