sourcegraph / sourcegraph-public-snapshot

Code AI platform with Code Search & Cody
https://sourcegraph.com
Other
10.12k stars 1.29k forks source link

Define vulnerability data format #47843

Closed willdollman closed 1 year ago

willdollman commented 1 year ago

We need to define the internal data format we use for storing vulnerability data.

I've created an initial version of this in the PR, along with code to parse and convert GHSA vuln data as a starting point.

I'm now moving on to parsing and converting the OpenSSF OSV format, is a common interchange format used by several vulnerability databases including GHSA and golang/vulndb. It's basically a superset of the structs I've been using to parse GHSA vulns, so just requires some more fiddly iterating and processing as it allows database-specific extensions.

I also want to take a look at the https://github.com/CVEProject/cve-schema format, which is used by https://github.com/CVEProject/cvelist.

Although we initially want to launch with just one data source, reviewing these different formats should help ensure we have a robust vuln data format to work with. Once we start on matching dependencies to vulns I'm sure we'll identify some other improvements too!

willdollman commented 1 year ago

https://github.com/sourcegraph/sourcegraph/pull/47531/commits

Added full support for OSV data, with additional handlers for additional data used by GHSA and Govulndb. Our internal Vulnerability type works well with all the data ingested so far, and seems logical for handling dependency matches.

One thing we need to think about is attribution of the data sources, as the licenses require it.