trustification / trustify

Apache License 2.0
10 stars 19 forks source link

Improve ingestion of CPEs #510

Open ctron opened 3 months ago

ctron commented 3 months ago

For PURLs we do have an optimized importer by now. A list of PURLs from an SBOM get's batch importer, with an upsert strategy.

However, for CPEs we still have the single "get or insert" strategy. As there seem to be a lot of CPEs in the SBOMs now, that hurts performance a lot.

The idea is to replicate the ingestion process from PURLs and apply the same pattern to CPEs. Batch insertion, plus upsert. A quick check for a single RHEL style SBOMs shows that this should bring down operations quite a bit, just by avoiding duplicates:

➜  sbom bzcat rhel-br-9.2.0.json.bz2 | grep cpe: | sort | wc
   4940    9880  348225
➜  sbom bzcat rhel-br-9.2.0.json.bz2 | grep cpe: | sort -u | wc
     27      54    1944

On the other hand, those CPEs are of type "security" and we can skip them at first. Also see: https://github.com/trustification/trustify/issues/509 … However, in the future we might want to ingest this information anyway. So we need to improve the CPE creationg process.

bobmcwhirter commented 3 months ago

I'm now using the pih CPEs to contextualize product-status from CSAF, so yes please, CPEs would be good. I'm currently relying upon graph.ingest_cpe(...)