google / osv.dev

Open source vulnerability DB and triage service.
https://osv.dev
Apache License 2.0
1.5k stars 186 forks source link

Always return the upstream aliases when no alias groups are generated #2374

Open Moullisha opened 3 months ago

Moullisha commented 3 months ago

Describe the bug Some vulnerabilities in OSV do not mention alias whereas the source link has alias data.

To Reproduce https://vuln.go.dev/ID/GO-2024-2947.json mentions two aliases whereas https://api.osv.dev/v1/vulns/GO-2024-2947 does not. Similar behaviour observed in case of GO-2024-2918, GHSA-v6v8-xj6m-xwqh, GO-2024-2963

Expected behaviour If alias is available in source data, same should be provided by OSV.

Screenshots If applicable, add screenshots to help explain your problem.

Additional context Add any other context about the problem here.

github-actions[bot] commented 3 months ago

:sparkles: Thank you for your interest in OSV.dev's data quality! :sparkles:

Please review our FAQ entry on how to most efficiently have this addressed.

another-rex commented 3 months ago

So this is caused by our AliasGroup computation finding there were too many transitive aliases, likely because the GO entry refers to (CVE-2024-6104), and I see a large number of chain guard advisories also refer to CVE-2024-6104, so deciding to skip generating the alias group.

There are 2 separate issues here:

  1. When no alias groups are generated, we should still always return the upstream aliases for this specific entry.
  2. Chain guard might not be correctly using aliases, and should be publishing related instead.
andrewpollock commented 3 months ago

2. Chain guard might not be correctly using aliases, and should be publishing related instead.

@cpanato are you the right person to review the Chainguard records being published? Refer to https://ossf.github.io/osv-schema/#aliases-field and https://ossf.github.io/osv-schema/#related-field

cpanato commented 3 months ago
  1. Chain guard might not be correctly using aliases, and should be publishing related instead.

@cpanato are you the right person to review the Chainguard records being published? Refer to https://ossf.github.io/osv-schema/#aliases-field and https://ossf.github.io/osv-schema/#related-field

can be, what is the issue, i did not understand that, do you have an example?

andrewpollock commented 2 months ago

can be, what is the issue, i did not understand that, do you have an example?

In a nutshell, aliases is meant to be 1:1 under most circumstances, whereas related may be 1:many

https://ossf.github.io/osv-schema/#aliases-field

Aliases should be considered symmetric

https://ossf.github.io/osv-schema/#related-field

Related vulnerabilities are symmetric but not transitive.

(any feedback on how to make this aspect of the schema documentation more comprehensible is very welcome)

It sounds like there are many Chainguard advisories aliased to CVE-2024-6104, which tends to indicate something is wrong:

cpanato commented 2 months ago

I see, i think I got it, I will update to move that to related field. thank you for the feedback

cc @luhring

luhring commented 2 months ago

I'd be cool with making the change in the Chainguard OSV data. We'd also want to give other consumers of the data a heads up if we do, but that wouldn't delay the change by more than a week or so.

I've read the docs for aliases and related a few times, and I think I'm not yet fully appreciating the correctness of related for this case, maybe you can help me. πŸ˜ƒ

Chainguard's advisories each describe the impact of one vulnerability in a specific distro package that we ship. These distro packages are downstream of the upstream components for which the vulnerabilities are originally reported. So the vulnerability itself (i.e. the specific software weakness) is the same, and any fixes upstream are inherited in these downstream distro packages. But the two software artifacts are distinct from one another.

I think I'm getting confused around these two sentences from the aliases docs:

1.

Two vulnerabilities can be described as aliases if a potential patch that addresses one of the vulnerabilities (and no other vulnerabilities) will also address the other vulnerability, and vice versa.

^ makes me lean toward "this isn't an alias". While it's true that our advisories don't describe multiple upstream vulnerabilities, patches to the vulnerability downstream in the distro package build process don't benefit the upstream project. But...

2.

Aliases may be used for vulnerabilities affecting different packages or ecosystems as long as they follow this definition.

^ makes me lean back toward "well, maybe this is an alias?", because it actually allows for there to be distinct packages in distinct ecosystems, which was my only hesitation in satisfying the first constraint.

The examples given for the docs of the related field are:

  • A similar but completely different vulnerability.

^ This doesn't seem to apply here.

  • A similar OSV entry that bundles multiple distinct vulnerabilities in the same entry.

^ Our OSV entries aren't bundling multiple distinct vulnerabilities, like other distro advisory systems sometimes do. However, each vulnerability in a language ecosystem (e.g. GHSA-, GO-, etc.) is "1:many" with Chainguard's advisories, in the case when a given upstream component is used in multiple downstream distro packages.

  • Cases that do not satisfy the strict definition of aliases.

^ This is helpful, but only once we clearly understand the definition of aliases.

We defer to the OSV maintainers on how Chainguard should be using the OSV format, of course! I'm just looking to make sure I can follow along intuitively.

michaelkedar commented 2 months ago

I think Chainguard might be on a bit of an edge case with our alias vs related distinction, but I'll try give our rationale:

The primary purpose of the aliases field is to allow for de-duplication of the same vulnerability from different data sources. e.g. CVE-2024-6104, GHSA-v6v8-xj6m-xwqh, and GO-2024-2947 all describe the same thing. A user of the OSV API would generally want to consider these all together as a single vulnerability for things like scanning and VEX generation.

Since we can't realistically expect all our data sources to know and list the correct IDs from every other data source, OSV computes the complete set of aliases by transitively combining sets of aliases from each ingested record. This becomes problematic if misused, since unrelated vulnerabilities can become lumped together by transitive chains of aliases. #888 might provide some background.

For Chainguard, CGA-23mp-53xj-rxq8, CGA-2gpx-j9wf-x7gf, and about 30 others all consider themselves aliases of CVE-2024-6104. This makes OSV treat all of them as aliases of each other as well, which is probably not intentional.

From what I can tell, these are listed as aliases because the artifacts are all built with the vulnerable Go package. While this might be considered as ambiguously correct, it doesn't produce a desirable alias grouping and therefore should not be done. To give my intuitive reasoning as to why aliases is wrong in this case: CGA-23mp-53xj-rxq8 contains CVE-2024-6104, but CGA-23mp-53xj-rxq8 is not CVE-2024-6104.

The related field is meant for informational purposes only, not de-duplication, and doesn't expand transitively so it's generally alright to add IDs there.

Aliases may be used for vulnerabilities affecting different packages or ecosystems as long as they follow this definition.

^ makes me lean back toward "well, maybe this is an alias?", because it actually allows for there to be distinct packages in distinct ecosystems, which was my only hesitation in satisfying the first constraint.

Admittedly, I'm not entirely sure what the schema's intent was for this line. I think it's there to allow aliases across ecosystems in the same way a single OSV record can contain multiple different ecosystems, but it is also a bit unclear to me.

Hopefully this helps to clear up why we want it this way. Sorry this comment became a bit of an essay :sweat_smile:

luhring commented 2 months ago

Thanks, this is incredibly helpful, @michaelkedar! πŸ™

Here's my proposal for next steps:

  1. To help address the breakage among the broader OSV ecosystem right now, we'll move our current aliases data to the related field.
  2. To help with the ambiguity in the schema docs, I'll open a PR with some suggestions. If those suggestions turn out not to be helpful, no hard feelings here. It was a bit difficult to figure out the "right answer" from the docs alone, so if we can make things any easier for the next onboarded ecosystem, then that's great.
  3. It looks like the aliases documentation line in question was updated in https://github.com/ossf/osv-schema/pull/193 β€”Β that was a great read. I share the concern expressed in that PR: There seems to be a "hole" in the OSV spec when it comes to distros' ability to participate. By moving to related, we're missing out on the opportunity to have strong, automation-usable links to the same vulnerability as described by our advisories. It seems like there should be a new field that's similar to aliases, but for strong "asymmetric" references, to help OSV better support vulnerability workflows beyond language ecosystems and into the world of distros. I can open an issue to capture this, and hopefully we'll have a good dialog there about potential improvements to the spec.

Please let me know how this sounds to you! πŸ˜ƒ

andrewpollock commented 2 months ago

@luhring

2. To help with the ambiguity in the schema docs, I'll open a PR with some suggestions.

Please do, high quality documentation is immensely helpful with scaling. If it isn't easily comprehensible to you, it's very likely to be to other unfamiliar readers. Your contributions are very welcome.

I'll fork out your third point as an issue on the OSV-Schema so it doesn't get lost.

andrewpollock commented 2 months ago

Hi @Moullisha thanks for flagging this, and sorry for the tangential noise with Chainguard record issues.

cpanato commented 2 months ago

now we moved to use the related field: https://packages.cgr.dev/chainguard/osv/CGA-23mp-53xj-rxq8.json

thanks @luhring

github-actions[bot] commented 3 weeks ago

This issue has not had any activity for 60 days and will be automatically closed in two weeks

See https://github.com/google/osv.dev/blob/master/CONTRIBUTING.md for how to contribute a PR if you're interested in helping out.