ossf / osv-schema

Open Source Vulnerability schema.
https://ossf.github.io/osv-schema/
Apache License 2.0
182 stars 79 forks source link

Formally include Common Weakness Enumeration (CWE) in the schema #254

Open andrewpollock opened 3 months ago

andrewpollock commented 3 months ago

Problem statement:

OSS users using OSV for vulnerability management have no standardized way to categorize vulnerabilities that they are currently or have historically been impacted by.

Researchers have no way taxonomize OSV records and produce interesting research based on OSV-provenanced data.

CWE is the industry standard way of taxonomizing vulnerabilities.

The GitHub Advisory Database, the largest publisher of OSV records by volume, is using database_specific.cwe_ids[] today, e.g. https://github.com/github/advisory-database/blob/0e3918c95bbd48455145dd8755a532e72445e05e/advisories/github-reviewed/2023/12/GHSA-45x7-px36-x8w8/GHSA-45x7-px36-x8w8.json#L578-L582

Related:

andrewpollock commented 1 month ago

@oliverchang WDYT, a straight out CWE field, or something more like severity (category)?

andrewpollock commented 1 month ago

Looking at CVE 5, they have problemTypes

oliverchang commented 4 weeks ago

So far, OSV has focused on being a minimal and lean schema focused on enabling vulnerability scanners to produce actionable and accurate results (and by extension, for database maintainers to be able to encode the necessary information to enable that).

CWEs primary use case appears to be for historical analysis purposes (e.g. analysing trends on types of vulnerabilities found in the past) and generally don't seem very useful for vulnerability scanners -- are there any other use cases I'm not aware of? Theoretically, someone doing analysis on vulnerabilities in open source can easily join OSV's data with another (such as NVD, CISA vuln enrichment) to collate this data.

OSV's goal is also to provide backwards compatibility with all schema changes, and as such while it's cheap in the moment to add a new field, it's expensive in the sense of never being able to remove it while adding incremental complexity to the schema as a whole. We'd have to balance the benefits of encoding this directly in OSV against that.