Ensure that distro advisories and aliasing work well together

ossf / osv-schema

Open Source Vulnerability schema.

https://ossf.github.io/osv-schema/

Apache License 2.0

184 stars 79 forks source link

Ensure that distro advisories and aliasing work well together #249

Open andrewpollock opened 3 months ago

andrewpollock commented 3 months ago

Raised @luhring in https://github.com/google/osv.dev/issues/2374 and capturing here:

It looks like the aliases documentation line in question was updated in https://github.com/ossf/osv-schema/pull/193 — that was a great read. I share the concern expressed in that PR: There seems to be a "hole" in the OSV spec when it comes to distros' ability to participate. By moving to related, we're missing out on the opportunity to have strong, automation-usable links to the same vulnerability as described by our advisories. It seems like there should be a new field that's similar to aliases, but for strong "asymmetric" references, to help OSV better support vulnerability workflows beyond language ecosystems and into the world of distros. I can open an issue to capture this, and hopefully we'll have a good dialog there about potential improvements to the spec.

luhring commented 3 months ago

Thanks for opening this!

As #193 points out, related isn't suitable for automation use cases, because the array items aren't guaranteed to have any particular relationship to the OSV record's vulnerability or its affected package(s).

With Linux distributions, we're consuming upstream software components and packaging them into our own distinct downstream software components.

So if Linux distributions provide OSV records to describe the affect of the vulnerability on their own packages, they cannot use aliases. This is because it's not guaranteed that the consumer of the upstream software component is also consuming that distribution's downstream component, and thus the upstream OSV record (e.g. a CVE or GHSA record) would be relevant to them while the distribution's OSV record (e.g. a DSA or CGA record) would not.

So there's no good option for linux distributions to use to store machine-discoverable links to upstream vulnerabilities.

I suggest adding a new field that's a stronger link than implied in related: similar to aliases, but for asymmetric relationships rather than symmetric. I don't know the best name for such a field, but perhaps inherits, consumes, upstream, or something.

To illustrate how this would work, imagine an OSV record from a Linux distribution like this:

{
  "modified": "2024-03-12T08:12:10Z",
  "id": "CGA-pc4f-g53c-c4gq",
  "upstream": [
    "GHSA-rr6r-cfgf-gc6h"
  ],
  // ...

This would have the following ideal outcomes:

Processors of OSV data would not consider CGA-pc4f-g53c-c4gq and GHSA-rr6r-cfgf-gc6h to be the same thing.
Automation systems wanting more information about CGA-pc4f-g53c-c4gq could now consider vulnerability data identified as GHSA-rr6r-cfgf-gc6h as directly applicable, albeit not the final say for the impact on the distro package.
Multiple distros could use the same IDs in their OSV records' upstream field (like GHSA-rr6r-cfgf-gc6h), and while that would let consumers discover more information about the vulnerability's source, it would not link the distros' OSV records to one another in any way.

andrewpollock commented 3 months ago

My initial reactive thought was includes or incorporates or even aggregates (which, to be fair, was my understanding of (at least one of) the intentions behind related).

oliverchang commented 3 months ago

Thank you for the feedback! One of the reasons we went with a more catch-all "related" was it was hard to encapsulate all the different use cases/relationships between vulnerability records. Additionally, having all of these very similar but subtly different fields may complicate and make the schema difficult to understand.

That said, if there is a clear, machine-automation use case for a field such as upstream, I think this is something we should add. Is the primary use case for automation systems here simply to answer the question: "Am I affected by CVE X in my distro?" And with the current related field, this would just give a "maybe" as an answer if it does live in any of the matched OSV records?

luhring commented 3 months ago

My initial reactive thought was includes or incorporates or even aggregates (which, to be fair, was my understanding of (at least one of) the intentions behind related).

👍 These names all sound good to me. And FWIW, I think related could work, but it'd require a substantial tightening of the definition of the field, which I would guess would be breaking and confusing for existing producers/consumers of that field.

One of the reasons we went with a more catch-all "related" was it was hard to encapsulate all the different use cases/relationships between vulnerability records. Additionally, having all of these very similar but subtly different fields may complicate and make the schema difficult to understand.

This definitely makes sense. I wouldn't want to open the door to N more relationship types each getting their own field, and then it becomes impossible to give guidance on which type is the exact right one for each scenario. One of my favorite traits of the OSV schema is its simplicity, and I hesitate to suggest adding a new field; but I'm just not sure how else to solve this for participants outside of the "language ecosystems" category.

Is the primary use case for automation systems here simply to answer the question: "Am I affected by CVE X in my distro?" And with the current related field, this would just give a "maybe" as an answer if it does live in any of the matched OSV records?

Exactly this! OSV's aliases field is really cool for consumers like vulnerability scanners and other security solutions, because it's a simple but powerful way to get more perspective on a vulnerability. By "JOIN"-ing to other aliased records, it's trivial to lookup what the Go team has to say about affected symbols for a package matched to a GHSA record, just as an example. This also means that it's not necessary for every OSV record in the "alias set" to copy each other's data into their own record. The "JOIN"-ability lets each ecosystem state what it knows best about that vulnerability.

So, Linux distributions want to be a part of that! ...without causing disruptions to the alias set itself. Speaking on behalf of Wolfi, it would be great for us to be able to weigh on on how a given vulnerability — expressed as another OSV record like GHSA-..., PSF-..., etc. — affects packages in the ecosystem we control, where the affected ranges are different (because the packages are different) and we can add other ecosystem-specific data of our own to the overall story.

This enables security tools to use the distro OSV data for matching, and then other OSV records to do other useful things, like provide users with more context about the vulnerability itself and cross-check the distro's findings with upstream findings. Zooming out, this also makes it easier for general consumers of the OSV database to see how different distros have handled a given vulnerability (it gets very interesting to compare notes like this during triaging!).

oliverchang commented 2 weeks ago

This has come up again in another context, so I think we should prioritise addressing this given that we're getting indications this is a real problem faced by various users of the OSV schema.

Perhaps something like the following:

Upstream

{
  "upstream": [ string ],
}

Theupstream field gives a list of IDs of upstream vulnerabilities that are bundled by the vulnerability.

For example, a downstream package ecosystem (such as a Linux distribution) may issue its own advisories that include (possibly multiple) upstream vulnerabilities.

upstream relationships are transitive but not symmetric. For example, if B is an upstream vulnerability for A, and C is an upstream vulnerability for B, then C is also an upstream vulnerability for A.

I do like upstream as the name for this since it captures the use case very clearly.

We would also remove the "A similar OSV entry that bundles multiple distinct vulnerabilities in the same entry." part from the related definition, and modify the corresponding recommendation to use related in the aliases description.

What do people think?

oliverchang commented 1 week ago

CCing some Linux distro folks here for feedback on https://github.com/ossf/osv-schema/issues/249#issuecomment-2425629271

The TL;DR is that we're proposing to add another field called upstream that tracks a list of upstream vulnerability IDs that are bundled as part of a distro advisory. e.g.

{
  "id": "DISTRO-1337",
  "upstream": [
    "CVE-2024-1337",
    "CVE-2024-1338",
   ]
}

Previously, we recommended using related for this, but related was a little underspecified and unsuitable for the use case of answering the question: "Am I affected by CVE X in my distro image based on the distro advisory DB?"

@Roo4L (AlmaLinux)
@luhring (Chainguard/Wolfi)
@mstg (Rocky Linux)
@jasinner (Red Hat)
@msmeissn (SUSE)
@dodys (Ubuntu)

Would appreciate your thoughts/feedback!

jasinner commented 1 week ago

Seems like a valid proposal to me.

dodys commented 1 week ago

I honestly don't like the name upstream in this context. Not all vulnerabilities is targeted to upstream projects. We do have vulnerabilities that are reported in specific ecosystems and specific package versions in that ecosystem. For example, let's say there was a vulnerability found only in Ubuntu 22.04 for package foo and it received the id CVE-XXXX-YYYY, it would look like this:

{
  "id": "UBUNTU-CVE-XXXX-YYYY",
  "upstream": [
    "CVE-XXXX-YYYY",
   ]
}

We would be upstream and downstream in this case, and it might not be clear to the users in any way. My understanding of upstream in this context is more like the vulnerability database, or catalog. Just where you first registered the vulnerability and got an ID, and then the "downstream/advisory ID" is just a specific view of such entry for a specific ecosystem.

luhring commented 1 week ago

In that example, I think that's what aliases is for, if I'm understanding correctly, where the CVE and the Ubuntu advisory affect the same set of software.

From here:

Two vulnerabilities can be described as aliases if they affect any given software component the same way: either both vulnerabilities affect the software component or neither do. A subsequent patch addresses both of the vulnerabilities (and no others), and vice versa.

The idea for upstream is that it's only to be used when the relationship between the advisories is asymmetric. Such as when the CVE affects more than just what the Ubuntu advisory covers, in which case it'd be invalid to use aliases to link the records involved.

mstg commented 1 week ago

Sounds good to me. upstream seems clearer in what kind of relationship there is between the advisory and the linked resource.

oliverchang commented 1 week ago

In that example, I think that's what aliases is for, if I'm understanding correctly, where the CVE and the Ubuntu advisory affect the same set of software.

From here:

Two vulnerabilities can be described as aliases if they affect any given software component the same way: either both vulnerabilities affect the software component or neither do. A subsequent patch addresses both of the vulnerabilities (and no others), and vice versa.

The idea for upstream is that it's only to be used when the relationship between the advisories is asymmetric. Such as when the CVE affects more than just what the Ubuntu advisory covers, in which case it'd be invalid to use aliases to link the records involved.

+1 to all of this! Indeed if a CVE is issued by Ubuntu directly for something that's Ubuntu-specific, it should be in aliases.

@dodys do you have any other concerns here?

dodys commented 1 week ago

In that example, I think that's what aliases is for, if I'm understanding correctly, where the CVE and the Ubuntu advisory affect the same set of software. From here:

Two vulnerabilities can be described as aliases if they affect any given software component the same way: either both vulnerabilities affect the software component or neither do. A subsequent patch addresses both of the vulnerabilities (and no others), and vice versa.

The idea for upstream is that it's only to be used when the relationship between the advisories is asymmetric. Such as when the CVE affects more than just what the Ubuntu advisory covers, in which case it'd be invalid to use aliases to link the records involved.

+1 to all of this! Indeed if a CVE is issued by Ubuntu directly for something that's Ubuntu-specific, it should be in aliases.

@dodys do you have any other concerns here?

If I can be honest, I don't think we will use aliases any time soon. To make it work with the automation, that would require a change in our tracking to make a note when a "upstream" CVE is a direct 1-1 relation to a "downstream" advisory (and here I only mean the UBUNTU-CVE-... advisories). Therefore we will continue using the related or upstream field only for the time being.

msmeissn commented 5 days ago

For SUSE this upstream relation is also good.