Open tgagneret-embedded opened 11 months ago
Sounds like a great enhancement! Are you interested in implementing it?
Yes I should be able to do it :)
Hi,
I prefer to give you some information on how I would implement this, because there is a lot of questions.
First, in the parser, I would check if cpe field is present. If so:
*
. Would it work to keep *
as product version for the CVE search ?However doing this, I would need to add vendor
to the list returned by parse
and use it when iterating on the modules. Related code can be found here:
Do you think it's a good idea ? Do you have any other any idea in mind ? Create a Module
class maybe (but it would be the same than ProductInfo
I think) ?
To do this I would need to use an external cpe parser. These are the one I could use:
Do you have any preference ?
Could I have your feedback on this ?
Thanks
You might want to look at what @anthonyharrison has done with purl because I think it'll be a little similar. But I think you're on the right track: if you can find a vendor, you'll just want to verify that it's real and then use it, skipping our guesswork search method entirely.
ExternalSecurity in SPDX uses CPE items. We also have PURL records which are preferred to CPE due to the inconsistency of CPE records, particular with respect to the vendor field. I would be particularly interested in identifying where the CPE record indicates a different product, vendor or version from that already specified elsewhere in the SBOM.
My recommendation would be to only consider the use of the EXTERNALSECURITY fields if a PURL record is not present.
Note that CISA have been consulting on the software naming problem which is what this is related to and we should try and stay aligned with the direction in which CISA are heading.
Hi,
Following the code, I found that adding a new condition for cpe23Type
, next to purl
, might work:
Howover, the code uses the lib4sbom
library that do not have the same result depending on the input format. For example .spdx
, .spdx.json
, .spdx.yaml
is working as expected (externalreference
field is parsed) but for .spdx.xml
or.spdx.rdf
, only name
and version
are parsed, so externalreference
are not reported and it gives an incomplete result.
You can find the parsing code here: https://github.com/anthonyharrison/lib4sbom/blob/main/lib4sbom/spdx/spdx_parser.py
It means that currently, only for some input format, purl
is taken into account. The same thing will happen for cpe23Type
.
Moreover I think parser present in https://github.com/intel/cve-bin-tool/tree/bf9afe998e93d4c3afed6ff7c5dee8d9749f7273/cve_bin_tool/sbom_manager are not used anymore (only __init__.py
is used) .
So, if my understanding is correct you switched to an external library to parse the SBOM. What do you think of the parsing output being different depending on input format ? Do you want to keep this external library ? If so, should we remove the useless code ?
Maybe I misunderstood something, please let me know.
Thank you.
@tgagneret-embedded You are correct in that lib4sbom (I am the author) only provides full support for SPDX formats tagvalue, JSON and YAML. These are the most common formats - XML format is still experimental for SPDX and I have never seen a RDF SBOM. Purl (and CPE23Type) details in many SBOMs are not commonly found in many SBOMs. The parsing needs to cater for the cases when they are not available (this will be the NORMAL case). The RDF and XML parsers are essentially already demonstrating this. Note that if PURL and CPE data is present, the PURL data should be used in preference to the CPE as the CPE data is notoriously unreliable.
Replacing the parsers originally in the cve-bin-tool with a third party tool was always the idea. When SBOM support was added in 2021, there weren't suitable Python libraries available to parse the various formats hence the local parsing. If there is now redundant code, then this should be removed.
Hi,
I don't entirely agree with you. Yes CPE is not consistent, but since cve-bin-tool searches CVE database, CPE is consistent I think (PURL might miss some CVE). When searching GitHub/Gitlab Advisory Database for example purl is making more sense yes. Since you can return a list of product (name
and version
) from the SBOM, we could return purl and cpe extracted data (as 2 products).
PR follows your suggestion (CPE is not taken into account if PURL is present) but maybe we could return both ?
What do you think ?
We've seen a lot of SBOM tools generate placeholder CPEs that aren't real or may be entirely incorrect, and the SBOM tools that generate PURLs currently seem to do more error correction. (and to be fair, we also make terrible CPE matching in certain circumstances, so I can see how it happens).
So with the current data I've seen, I'm in favour of preferring PURL and not returning both "just in case" -- but I'd be willing to consider a CPE/PURL/BOTH config option right now if you want. And deep in my heart I don't think it's going to be true that PURL is higher quality forever (eventually tools will be just as likely to return trash PURL values) so we may have to revisit with actual data now and later to make sure we're still making the best choice.
Maybe we could update the ProductInfo to something like this:
class ProductInfo(NamedTuple):
product: str
version: str
cpe: CPE(vendor, product, version)
purl: PURL(product, version)
This way we could use cpe
to search the NVD database, purl
for github/gitlab advisory database, and fall back to product
and version
if necessary.
I'm not sure of how the multiple database are handled, but is this something that would make sense to you ?
Tagging @anthonyharrison to look at this again and see if we've got everything we need to close it.
@terriko I think the changes made to the SBOM parsing in sbom_manager/parse.py
and the changes due to #4178 in which Purl references are priortised over CPE references should be sufficient to improve the accuracy of finding CVEs.
Hi,
I've been looking into how cve-bin-tool processes SBOMs to detect vulnerabilities, and found the behavior change in https://github.com/intel/cve-bin-tool/pull/4335 to be somewhat of a regression.
The currently implemented behavior is: if a PURL is detected, it is preferred over any specified CPE. But only the PURL version is actually used, falling back to using the SBOM package name, rather than trying to use the package name from the PURL. And the PURL vendor is also skipped entirely. See https://github.com/anthonyharrison/cve-bin-tool/blob/c9bc8a04241b2c81155b2c2cd97946907553b975/cve_bin_tool/sbom_manager/parse.py#L389
This works out if the SBOM package name matches the name used in the advisory databases. But if the SBOM package name is entirely different, e.g. instead of zlib
it is PatchedZlib
or VendoredZlib
then no matches would be found.
In contrast the CPE decoding code path always uses the vendor and package name from the CPE, so even if the SBOM package name is PatchedZlib
, as long as proper CPE
is specified, vulnerabilities will be found.
As mentioned in the previous comments, I think as a first step, there should be options to prefer one or the other, or perhaps to add an entry for each CPE or PURL found.
Ideally the code should also be improved to parse out the name and vendor out of the PURL values when they are specified. And maybe make it even more configurable, e.g discard PURL values that don't have a detected name and the fallback to SPDX package names is not desired.
I implemented the add all CPE and PURL values approach at https://github.com/alcroito/cve-bin-tool/tree/better_cpe_purl_sbom_handling but i will try to investigate some of the other options as well.
@alcroito I agree with you that there's a bunch of room for improvement and refinement here. Let us know if you have questions while you're investigating!
Description
When parsing SPDX file,
ExternalRef SECURITY <cpe>
is not taken into account.Why?
Sometimes package name does not match cpe name. Moreover, some CPE can have the same product name but different vendors. This leads to poor CVE result. To resolve this, if
ExternalRef SECURITY
is found in the SPDX, it could be used instead of the package name for searching CVE. This would improve the result of the scan.What do you think ?
Thanks