Open Moullisha opened 3 months ago
:sparkles: Thank you for your interest in OSV.dev's data quality! :sparkles:
Please review our FAQ entry on how to most efficiently have this addressed.
So this is caused by our AliasGroup computation finding there were too many transitive aliases, likely because the GO entry refers to (CVE-2024-6104), and I see a large number of chain guard advisories also refer to CVE-2024-6104, so deciding to skip generating the alias group.
There are 2 separate issues here:
2. Chain guard might not be correctly using aliases, and should be publishing related instead.
@cpanato are you the right person to review the Chainguard records being published? Refer to https://ossf.github.io/osv-schema/#aliases-field and https://ossf.github.io/osv-schema/#related-field
- Chain guard might not be correctly using aliases, and should be publishing related instead.
@cpanato are you the right person to review the Chainguard records being published? Refer to https://ossf.github.io/osv-schema/#aliases-field and https://ossf.github.io/osv-schema/#related-field
can be, what is the issue, i did not understand that, do you have an example?
can be, what is the issue, i did not understand that, do you have an example?
In a nutshell, aliases
is meant to be 1:1 under most circumstances, whereas related
may be 1:many
https://ossf.github.io/osv-schema/#aliases-field
Aliases should be considered symmetric
https://ossf.github.io/osv-schema/#related-field
Related vulnerabilities are symmetric but not transitive.
(any feedback on how to make this aspect of the schema documentation more comprehensible is very welcome)
It sounds like there are many Chainguard advisories aliased to CVE-2024-6104, which tends to indicate something is wrong:
I see, i think I got it, I will update to move that to related field. thank you for the feedback
cc @luhring
I'd be cool with making the change in the Chainguard OSV data. We'd also want to give other consumers of the data a heads up if we do, but that wouldn't delay the change by more than a week or so.
I've read the docs for aliases
and related
a few times, and I think I'm not yet fully appreciating the correctness of related
for this case, maybe you can help me. π
Chainguard's advisories each describe the impact of one vulnerability in a specific distro package that we ship. These distro packages are downstream of the upstream components for which the vulnerabilities are originally reported. So the vulnerability itself (i.e. the specific software weakness) is the same, and any fixes upstream are inherited in these downstream distro packages. But the two software artifacts are distinct from one another.
I think I'm getting confused around these two sentences from the aliases
docs:
1.
Two vulnerabilities can be described as aliases if a potential patch that addresses one of the vulnerabilities (and no other vulnerabilities) will also address the other vulnerability, and vice versa.
^ makes me lean toward "this isn't an alias". While it's true that our advisories don't describe multiple upstream vulnerabilities, patches to the vulnerability downstream in the distro package build process don't benefit the upstream project. But...
2.
Aliases may be used for vulnerabilities affecting different packages or ecosystems as long as they follow this definition.
^ makes me lean back toward "well, maybe this is an alias?", because it actually allows for there to be distinct packages in distinct ecosystems, which was my only hesitation in satisfying the first constraint.
The examples given for the docs of the related
field are:
- A similar but completely different vulnerability.
^ This doesn't seem to apply here.
- A similar OSV entry that bundles multiple distinct vulnerabilities in the same entry.
^ Our OSV entries aren't bundling multiple distinct vulnerabilities, like other distro advisory systems sometimes do. However, each vulnerability in a language ecosystem (e.g. GHSA-, GO-, etc.) is "1:many" with Chainguard's advisories, in the case when a given upstream component is used in multiple downstream distro packages.
- Cases that do not satisfy the strict definition of aliases.
^ This is helpful, but only once we clearly understand the definition of aliases.
We defer to the OSV maintainers on how Chainguard should be using the OSV format, of course! I'm just looking to make sure I can follow along intuitively.
I think Chainguard might be on a bit of an edge case with our alias
vs related
distinction, but I'll try give our rationale:
The primary purpose of the aliases
field is to allow for de-duplication of the same vulnerability from different data sources. e.g. CVE-2024-6104, GHSA-v6v8-xj6m-xwqh, and GO-2024-2947 all describe the same thing. A user of the OSV API would generally want to consider these all together as a single vulnerability for things like scanning and VEX generation.
Since we can't realistically expect all our data sources to know and list the correct IDs from every other data source, OSV computes the complete set of aliases by transitively combining sets of aliases from each ingested record. This becomes problematic if misused, since unrelated vulnerabilities can become lumped together by transitive chains of aliases. #888 might provide some background.
For Chainguard, CGA-23mp-53xj-rxq8, CGA-2gpx-j9wf-x7gf, and about 30 others all consider themselves aliases of CVE-2024-6104. This makes OSV treat all of them as aliases of each other as well, which is probably not intentional.
From what I can tell, these are listed as aliases because the artifacts are all built with the vulnerable Go package. While this might be considered as ambiguously correct, it doesn't produce a desirable alias grouping and therefore should not be done.
To give my intuitive reasoning as to why aliases
is wrong in this case: CGA-23mp-53xj-rxq8 contains CVE-2024-6104, but CGA-23mp-53xj-rxq8 is not CVE-2024-6104.
The related
field is meant for informational purposes only, not de-duplication, and doesn't expand transitively so it's generally alright to add IDs there.
Aliases may be used for vulnerabilities affecting different packages or ecosystems as long as they follow this definition.
^ makes me lean back toward "well, maybe this is an alias?", because it actually allows for there to be distinct packages in distinct ecosystems, which was my only hesitation in satisfying the first constraint.
Admittedly, I'm not entirely sure what the schema's intent was for this line. I think it's there to allow aliases across ecosystems in the same way a single OSV record can contain multiple different ecosystems, but it is also a bit unclear to me.
Hopefully this helps to clear up why we want it this way. Sorry this comment became a bit of an essay :sweat_smile:
Thanks, this is incredibly helpful, @michaelkedar! π
Here's my proposal for next steps:
aliases
data to the related
field.related
, we're missing out on the opportunity to have strong, automation-usable links to the same vulnerability as described by our advisories. It seems like there should be a new field that's similar to aliases
, but for strong "asymmetric" references, to help OSV better support vulnerability workflows beyond language ecosystems and into the world of distros. I can open an issue to capture this, and hopefully we'll have a good dialog there about potential improvements to the spec.Please let me know how this sounds to you! π
@luhring
2. To help with the ambiguity in the schema docs, I'll open a PR with some suggestions.
Please do, high quality documentation is immensely helpful with scaling. If it isn't easily comprehensible to you, it's very likely to be to other unfamiliar readers. Your contributions are very welcome.
I'll fork out your third point as an issue on the OSV-Schema so it doesn't get lost.
Hi @Moullisha thanks for flagging this, and sorry for the tangential noise with Chainguard record issues.
now we moved to use the related field: https://packages.cgr.dev/chainguard/osv/CGA-23mp-53xj-rxq8.json
thanks @luhring
This issue has not had any activity for 60 days and will be automatically closed in two weeks
See https://github.com/google/osv.dev/blob/master/CONTRIBUTING.md for how to contribute a PR if you're interested in helping out.
Describe the bug Some vulnerabilities in OSV do not mention alias whereas the source link has alias data.
To Reproduce https://vuln.go.dev/ID/GO-2024-2947.json mentions two aliases whereas https://api.osv.dev/v1/vulns/GO-2024-2947 does not. Similar behaviour observed in case of GO-2024-2918, GHSA-v6v8-xj6m-xwqh, GO-2024-2963
Expected behaviour If alias is available in source data, same should be provided by OSV.
Screenshots If applicable, add screenshots to help explain your problem.
Additional context Add any other context about the problem here.