feat: Use ExternalRef SECURITY in SPDX to find CVE

intel / cve-bin-tool

The CVE Binary Tool helps you determine if your system includes known vulnerabilities. You can scan binaries for over 200 common, vulnerable components (openssl, libpng, libxml2, expat and others), or if you know the components used, you can get a list of known vulnerabilities associated with an SBOM or a list of components and versions.

https://cve-bin-tool.readthedocs.io/en/latest/

GNU General Public License v3.0

1.24k stars 464 forks source link

feat: Use ExternalRef SECURITY in SPDX to find CVE #3588

Open tgagneret-embedded opened 11 months ago

tgagneret-embedded commented 11 months ago

Description

When parsing SPDX file, ExternalRef SECURITY <cpe> is not taken into account.

Why?

Sometimes package name does not match cpe name. Moreover, some CPE can have the same product name but different vendors. This leads to poor CVE result. To resolve this, if ExternalRef SECURITY is found in the SPDX, it could be used instead of the package name for searching CVE. This would improve the result of the scan.

What do you think ?

Thanks

terriko commented 11 months ago

Sounds like a great enhancement! Are you interested in implementing it?

tgagneret-embedded commented 11 months ago

Yes I should be able to do it :)

tgagneret-embedded commented 11 months ago

Hi,

I prefer to give you some information on how I would implement this, because there is a lot of questions.

First, in the parser, I would check if cpe field is present. If so:

replace existing product name with cpe content
add vendor name to the return list
replace existing version with cpe content. I'm not sure what I should do if version is *. Would it work to keep * as product version for the CVE search ?

However doing this, I would need to add vendor to the list returned by parse and use it when iterating on the modules. Related code can be found here:

https://github.com/intel/cve-bin-tool/blob/v3.2.1/cve_bin_tool/sbom_manager/spdx_parser.py#L40
- https://github.com/intel/cve-bin-tool/blob/v3.2.1/cve_bin_tool/sbom_manager/__init__.py#L70

Do you think it's a good idea ? Do you have any other any idea in mind ? Create a Module class maybe (but it would be the same than ProductInfo I think) ?

To do this I would need to use an external cpe parser. These are the one I could use:

Do you have any preference ?

Could I have your feedback on this ?

Thanks

terriko commented 11 months ago

You might want to look at what @anthonyharrison has done with purl because I think it'll be a little similar. But I think you're on the right track: if you can find a vendor, you'll just want to verify that it's real and then use it, skipping our guesswork search method entirely.

anthonyharrison commented 11 months ago

ExternalSecurity in SPDX uses CPE items. We also have PURL records which are preferred to CPE due to the inconsistency of CPE records, particular with respect to the vendor field. I would be particularly interested in identifying where the CPE record indicates a different product, vendor or version from that already specified elsewhere in the SBOM.

My recommendation would be to only consider the use of the EXTERNALSECURITY fields if a PURL record is not present.

Note that CISA have been consulting on the software naming problem which is what this is related to and we should try and stay aligned with the direction in which CISA are heading.

Software Identification Options

tgagneret-embedded commented 10 months ago

Hi,

Following the code, I found that adding a new condition for cpe23Type, next to purl, might work:

https://github.com/intel/cve-bin-tool/blob/bf9afe998e93d4c3afed6ff7c5dee8d9749f7273/cve_bin_tool/sbom_manager/__init__.py#L123

Howover, the code uses the lib4sbom library that do not have the same result depending on the input format. For example .spdx, .spdx.json, .spdx.yaml is working as expected (externalreference field is parsed) but for .spdx.xml or.spdx.rdf, only name and version are parsed, so externalreference are not reported and it gives an incomplete result.

You can find the parsing code here: https://github.com/anthonyharrison/lib4sbom/blob/main/lib4sbom/spdx/spdx_parser.py

It means that currently, only for some input format, purl is taken into account. The same thing will happen for cpe23Type.

Moreover I think parser present in https://github.com/intel/cve-bin-tool/tree/bf9afe998e93d4c3afed6ff7c5dee8d9749f7273/cve_bin_tool/sbom_manager are not used anymore (only __init__.py is used) .

So, if my understanding is correct you switched to an external library to parse the SBOM. What do you think of the parsing output being different depending on input format ? Do you want to keep this external library ? If so, should we remove the useless code ?

Maybe I misunderstood something, please let me know.

Thank you.

anthonyharrison commented 10 months ago

@tgagneret-embedded You are correct in that lib4sbom (I am the author) only provides full support for SPDX formats tagvalue, JSON and YAML. These are the most common formats - XML format is still experimental for SPDX and I have never seen a RDF SBOM. Purl (and CPE23Type) details in many SBOMs are not commonly found in many SBOMs. The parsing needs to cater for the cases when they are not available (this will be the NORMAL case). The RDF and XML parsers are essentially already demonstrating this. Note that if PURL and CPE data is present, the PURL data should be used in preference to the CPE as the CPE data is notoriously unreliable.

Replacing the parsers originally in the cve-bin-tool with a third party tool was always the idea. When SBOM support was added in 2021, there weren't suitable Python libraries available to parse the various formats hence the local parsing. If there is now redundant code, then this should be removed.

tgagneret-embedded commented 10 months ago

Hi,

I don't entirely agree with you. Yes CPE is not consistent, but since cve-bin-tool searches CVE database, CPE is consistent I think (PURL might miss some CVE). When searching GitHub/Gitlab Advisory Database for example purl is making more sense yes. Since you can return a list of product (name and version) from the SBOM, we could return purl and cpe extracted data (as 2 products).

PR follows your suggestion (CPE is not taken into account if PURL is present) but maybe we could return both ?

What do you think ?

terriko commented 10 months ago

We've seen a lot of SBOM tools generate placeholder CPEs that aren't real or may be entirely incorrect, and the SBOM tools that generate PURLs currently seem to do more error correction. (and to be fair, we also make terrible CPE matching in certain circumstances, so I can see how it happens).

So with the current data I've seen, I'm in favour of preferring PURL and not returning both "just in case" -- but I'd be willing to consider a CPE/PURL/BOTH config option right now if you want. And deep in my heart I don't think it's going to be true that PURL is higher quality forever (eventually tools will be just as likely to return trash PURL values) so we may have to revisit with actual data now and later to make sure we're still making the best choice.

tgagneret-embedded commented 10 months ago

Maybe we could update the ProductInfo to something like this:

class ProductInfo(NamedTuple):
    product: str
    version: str
    cpe: CPE(vendor, product, version)
    purl: PURL(product, version)

This way we could use cpe to search the NVD database, purl for github/gitlab advisory database, and fall back to product and version if necessary.

I'm not sure of how the multiple database are handled, but is this something that would make sense to you ?

terriko commented 3 months ago

Tagging @anthonyharrison to look at this again and see if we've got everything we need to close it.

anthonyharrison commented 3 months ago

@terriko I think the changes made to the SBOM parsing in sbom_manager/parse.py and the changes due to #4178 in which Purl references are priortised over CPE references should be sufficient to improve the accuracy of finding CVEs.

alcroito commented 2 months ago

Hi,

I've been looking into how cve-bin-tool processes SBOMs to detect vulnerabilities, and found the behavior change in https://github.com/intel/cve-bin-tool/pull/4335 to be somewhat of a regression.

The currently implemented behavior is: if a PURL is detected, it is preferred over any specified CPE. But only the PURL version is actually used, falling back to using the SBOM package name, rather than trying to use the package name from the PURL. And the PURL vendor is also skipped entirely. See https://github.com/anthonyharrison/cve-bin-tool/blob/c9bc8a04241b2c81155b2c2cd97946907553b975/cve_bin_tool/sbom_manager/parse.py#L389

This works out if the SBOM package name matches the name used in the advisory databases. But if the SBOM package name is entirely different, e.g. instead of zlib it is PatchedZlib or VendoredZlib then no matches would be found.

In contrast the CPE decoding code path always uses the vendor and package name from the CPE, so even if the SBOM package name is PatchedZlib, as long as proper CPE is specified, vulnerabilities will be found.

As mentioned in the previous comments, I think as a first step, there should be options to prefer one or the other, or perhaps to add an entry for each CPE or PURL found.

Ideally the code should also be improved to parse out the name and vendor out of the PURL values when they are specified. And maybe make it even more configurable, e.g discard PURL values that don't have a detected name and the fallback to SPDX package names is not desired.

I implemented the add all CPE and PURL values approach at https://github.com/alcroito/cve-bin-tool/tree/better_cpe_purl_sbom_handling but i will try to investigate some of the other options as well.

terriko commented 2 months ago

@alcroito I agree with you that there's a bunch of room for improvement and refinement here. Let us know if you have questions while you're investigating!