Open andrewpollock opened 3 months ago
Thanks for opening this!
As #193 points out, related
isn't suitable for automation use cases, because the array items aren't guaranteed to have any particular relationship to the OSV record's vulnerability or its affected package(s).
With Linux distributions, we're consuming upstream software components and packaging them into our own distinct downstream software components.
So if Linux distributions provide OSV records to describe the affect of the vulnerability on their own packages, they cannot use aliases
. This is because it's not guaranteed that the consumer of the upstream software component is also consuming that distribution's downstream component, and thus the upstream OSV record (e.g. a CVE or GHSA record) would be relevant to them while the distribution's OSV record (e.g. a DSA or CGA record) would not.
So there's no good option for linux distributions to use to store machine-discoverable links to upstream vulnerabilities.
I suggest adding a new field that's a stronger link than implied in related
: similar to aliases
, but for asymmetric relationships rather than symmetric. I don't know the best name for such a field, but perhaps inherits
, consumes
, upstream
, or something.
To illustrate how this would work, imagine an OSV record from a Linux distribution like this:
{
"modified": "2024-03-12T08:12:10Z",
"id": "CGA-pc4f-g53c-c4gq",
"upstream": [
"GHSA-rr6r-cfgf-gc6h"
],
// ...
This would have the following ideal outcomes:
CGA-pc4f-g53c-c4gq
and GHSA-rr6r-cfgf-gc6h
to be the same thing.CGA-pc4f-g53c-c4gq
could now consider vulnerability data identified as GHSA-rr6r-cfgf-gc6h
as directly applicable, albeit not the final say for the impact on the distro package.upstream
field (like GHSA-rr6r-cfgf-gc6h
), and while that would let consumers discover more information about the vulnerability's source, it would not link the distros' OSV records to one another in any way.My initial reactive thought was includes
or incorporates
or even aggregates
(which, to be fair, was my understanding of (at least one of) the intentions behind related
).
Thank you for the feedback! One of the reasons we went with a more catch-all "related" was it was hard to encapsulate all the different use cases/relationships between vulnerability records. Additionally, having all of these very similar but subtly different fields may complicate and make the schema difficult to understand.
That said, if there is a clear, machine-automation use case for a field such as upstream
, I think this is something we should add. Is the primary use case for automation systems here simply to answer the question: "Am I affected by CVE X in my distro?" And with the current related
field, this would just give a "maybe" as an answer if it does live in any of the matched OSV records?
My initial reactive thought was includes or incorporates or even aggregates (which, to be fair, was my understanding of (at least one of) the intentions behind related).
👍 These names all sound good to me. And FWIW, I think related
could work, but it'd require a substantial tightening of the definition of the field, which I would guess would be breaking and confusing for existing producers/consumers of that field.
One of the reasons we went with a more catch-all "related" was it was hard to encapsulate all the different use cases/relationships between vulnerability records. Additionally, having all of these very similar but subtly different fields may complicate and make the schema difficult to understand.
This definitely makes sense. I wouldn't want to open the door to N more relationship types each getting their own field, and then it becomes impossible to give guidance on which type is the exact right one for each scenario. One of my favorite traits of the OSV schema is its simplicity, and I hesitate to suggest adding a new field; but I'm just not sure how else to solve this for participants outside of the "language ecosystems" category.
Is the primary use case for automation systems here simply to answer the question: "Am I affected by CVE X in my distro?" And with the current related field, this would just give a "maybe" as an answer if it does live in any of the matched OSV records?
Exactly this! OSV's aliases
field is really cool for consumers like vulnerability scanners and other security solutions, because it's a simple but powerful way to get more perspective on a vulnerability. By "JOIN"-ing to other aliased records, it's trivial to lookup what the Go team has to say about affected symbols for a package matched to a GHSA record, just as an example. This also means that it's not necessary for every OSV record in the "alias set" to copy each other's data into their own record. The "JOIN"-ability lets each ecosystem state what it knows best about that vulnerability.
So, Linux distributions want to be a part of that! ...without causing disruptions to the alias set itself. Speaking on behalf of Wolfi, it would be great for us to be able to weigh on on how a given vulnerability — expressed as another OSV record like GHSA-...
, PSF-...
, etc. — affects packages in the ecosystem we control, where the affected ranges are different (because the packages are different) and we can add other ecosystem-specific data of our own to the overall story.
This enables security tools to use the distro OSV data for matching, and then other OSV records to do other useful things, like provide users with more context about the vulnerability itself and cross-check the distro's findings with upstream findings. Zooming out, this also makes it easier for general consumers of the OSV database to see how different distros have handled a given vulnerability (it gets very interesting to compare notes like this during triaging!).
This has come up again in another context, so I think we should prioritise addressing this given that we're getting indications this is a real problem faced by various users of the OSV schema.
Perhaps something like the following:
{
"upstream": [ string ],
}
Theupstream
field gives a list of IDs of upstream vulnerabilities that are bundled by the vulnerability.
For example, a downstream package ecosystem (such as a Linux distribution) may issue its own advisories that include (possibly multiple) upstream vulnerabilities.
upstream
relationships are transitive but not symmetric. For example, if B is an upstream vulnerability for A, and C is an upstream vulnerability for B, then C is also an upstream vulnerability for A.
I do like upstream
as the name for this since it captures the use case very clearly.
We would also remove the "A similar OSV entry that bundles multiple distinct vulnerabilities in the same entry." part from the related
definition, and modify the corresponding recommendation to use related
in the aliases
description.
What do people think?
CCing some Linux distro folks here for feedback on https://github.com/ossf/osv-schema/issues/249#issuecomment-2425629271
The TL;DR is that we're proposing to add another field called upstream
that tracks a list of upstream vulnerability IDs that are bundled as part of a distro advisory. e.g.
{
"id": "DISTRO-1337",
"upstream": [
"CVE-2024-1337",
"CVE-2024-1338",
]
}
Previously, we recommended using related
for this, but related
was a little underspecified and unsuitable for the use case of answering the question: "Am I affected by CVE X in my distro image based on the distro advisory DB?"
Would appreciate your thoughts/feedback!
Seems like a valid proposal to me.
I honestly don't like the name upstream
in this context.
Not all vulnerabilities is targeted to upstream projects. We do have vulnerabilities that are reported in specific ecosystems and specific package versions in that ecosystem. For example, let's say there was a vulnerability found only in Ubuntu 22.04 for package foo
and it received the id CVE-XXXX-YYYY
, it would look like this:
{
"id": "UBUNTU-CVE-XXXX-YYYY",
"upstream": [
"CVE-XXXX-YYYY",
]
}
We would be upstream and downstream in this case, and it might not be clear to the users in any way.
My understanding of upstream
in this context is more like the vulnerability database, or catalog. Just where you first registered the vulnerability and got an ID, and then the "downstream/advisory ID" is just a specific view of such entry for a specific ecosystem.
In that example, I think that's what aliases
is for, if I'm understanding correctly, where the CVE and the Ubuntu advisory affect the same set of software.
From here:
Two vulnerabilities can be described as aliases if they affect any given software component the same way: either both vulnerabilities affect the software component or neither do. A subsequent patch addresses both of the vulnerabilities (and no others), and vice versa.
The idea for upstream
is that it's only to be used when the relationship between the advisories is asymmetric. Such as when the CVE affects more than just what the Ubuntu advisory covers, in which case it'd be invalid to use aliases
to link the records involved.
Sounds good to me. upstream
seems clearer in what kind of relationship there is between the advisory and the linked resource.
In that example, I think that's what
aliases
is for, if I'm understanding correctly, where the CVE and the Ubuntu advisory affect the same set of software.From here:
Two vulnerabilities can be described as aliases if they affect any given software component the same way: either both vulnerabilities affect the software component or neither do. A subsequent patch addresses both of the vulnerabilities (and no others), and vice versa.
The idea for
upstream
is that it's only to be used when the relationship between the advisories is asymmetric. Such as when the CVE affects more than just what the Ubuntu advisory covers, in which case it'd be invalid to usealiases
to link the records involved.
+1 to all of this! Indeed if a CVE is issued by Ubuntu directly for something that's Ubuntu-specific, it should be in aliases
.
@dodys do you have any other concerns here?
In that example, I think that's what
aliases
is for, if I'm understanding correctly, where the CVE and the Ubuntu advisory affect the same set of software. From here:Two vulnerabilities can be described as aliases if they affect any given software component the same way: either both vulnerabilities affect the software component or neither do. A subsequent patch addresses both of the vulnerabilities (and no others), and vice versa.
The idea for
upstream
is that it's only to be used when the relationship between the advisories is asymmetric. Such as when the CVE affects more than just what the Ubuntu advisory covers, in which case it'd be invalid to usealiases
to link the records involved.+1 to all of this! Indeed if a CVE is issued by Ubuntu directly for something that's Ubuntu-specific, it should be in
aliases
.@dodys do you have any other concerns here?
If I can be honest, I don't think we will use aliases any time soon. To make it work with the automation, that would require a change in our tracking to make a note when a "upstream" CVE is a direct 1-1 relation to a "downstream" advisory (and here I only mean the UBUNTU-CVE-...
advisories). Therefore we will continue using the related
or upstream
field only for the time being.
For SUSE this upstream relation is also good.
Raised @luhring in https://github.com/google/osv.dev/issues/2374 and capturing here: