Improve CVE entry management in OSV

hogo6002 commented 3 months ago

The current OSV structure combines vulnerability data from different resources (e.g., NVD, Alpine, Debian) into a single CVE entry based on shared CVE IDs. This approach leads to overly large and difficult to maintain CVE entries. With Ubuntu also publishing its security tracker data to OSV.dev using CVE IDs, this issue will likely worsen.

We probably need a better solution for managing CVE entries. One idea is to add a source-specific prefix or suffix to the CVE ID, creating separate entries for each data source. For example, Alpine-CVE-2024-0001 and Debian-CVE-2024-0001 would be displayed as two distinct records on OSV.dev.

dodys commented 3 months ago

As someone that does not have a view of the whole osv.dev machinery, I think the main question is, why would you want to combine CVE entries?

hogo6002 commented 3 months ago

why would you want to combine CVE entries?

The initial reason was that Alpine SecDB uses CVE IDs to publish their vulnerabilities. So, we created a tool to combine their data into our existing CVEs, as long as they use the same vulnerability ID. Then, we wanted to add Debian security tracker data into our OSV to better support container scanning, so we continued to combine everything into our CVE entries to keep it consistent, as the Debian security tracker also uses CVE IDs.

However, the problem now is that we continue to combine more and more data into a single record, increasing the difficulty of our maintenance. So, we are considering some better changes to this approach.

dodys commented 3 months ago

why would you want to combine CVE entries?

The initial reason was that Alpine SecDB uses CVE IDs to publish their vulnerabilities. So, we created a tool to combine their data into our existing CVEs, as long as they use the same vulnerability ID. Then, we wanted to add Debian security tracker data into our OSV to better support container scanning, so we continued to combine everything into our CVE entries to keep it consistent, as the Debian security tracker also uses CVE IDs.

However, the problem now is that we continue to combine more and more data into a single record, increasing the difficulty of our maintenance. So, we are considering some better changes to this approach.

From your first message, the thing that is not clear to me is, if you are not going to combine CVEs anymore, then why would you need to still add the prefixes (e.g. Debian-CVE-2024-0001) ? Aren't those completely separate buckets?

oliverchang commented 3 months ago

why would you want to combine CVE entries?

The initial reason was that Alpine SecDB uses CVE IDs to publish their vulnerabilities. So, we created a tool to combine their data into our existing CVEs, as long as they use the same vulnerability ID. Then, we wanted to add Debian security tracker data into our OSV to better support container scanning, so we continued to combine everything into our CVE entries to keep it consistent, as the Debian security tracker also uses CVE IDs. However, the problem now is that we continue to combine more and more data into a single record, increasing the difficulty of our maintenance. So, we are considering some better changes to this approach.

From your first message, the thing that is not clear to me is, if you are not going to combine CVEs anymore, then why would you need to still add the prefixes (e.g. Debian-CVE-2024-0001) ? Aren't those completely separate buckets?

For OSV, ID prefixes must be unique, and be associated with a home database. Debian security tracker, Alpine etc didn't have a unique ID before, so we were combining them into a single CVE record. If we don't combine them, then they would need their own prefix.

dodys commented 3 months ago

why would you want to combine CVE entries?

The initial reason was that Alpine SecDB uses CVE IDs to publish their vulnerabilities. So, we created a tool to combine their data into our existing CVEs, as long as they use the same vulnerability ID. Then, we wanted to add Debian security tracker data into our OSV to better support container scanning, so we continued to combine everything into our CVE entries to keep it consistent, as the Debian security tracker also uses CVE IDs. However, the problem now is that we continue to combine more and more data into a single record, increasing the difficulty of our maintenance. So, we are considering some better changes to this approach.

From your first message, the thing that is not clear to me is, if you are not going to combine CVEs anymore, then why would you need to still add the prefixes (e.g. Debian-CVE-2024-0001) ? Aren't those completely separate buckets?

For OSV, ID prefixes must be unique, and be associated with a home database. Debian security tracker, Alpine etc didn't have a unique ID before, so we were combining them into a single CVE record. If we don't combine them, then they would need their own prefix.

Thanks for clarifying it, and would the prefix also be necessary for the filename itself? or could we have a filename CVE-2024-0001.json and inside of it id = "UBTU-CVE-2024-0001?

hogo6002 commented 3 months ago

Thanks for clarifying it, and would the prefix also be necessary for the filename itself? or could we have a filename CVE-2024-0001.json and inside of it id = "UBTU-CVE-2024-0001?

We do prefer the filename to match the ID name to keep it consistent for other ecosystems. Are there any concerns for Ubuntu in changing the filename?

dodys commented 3 months ago

Thanks for clarifying it, and would the prefix also be necessary for the filename itself? or could we have a filename CVE-2024-0001.json and inside of it id = "UBTU-CVE-2024-0001?

We do prefer the filename to match the ID name to keep it consistent for other ecosystems. Are there any concerns for Ubuntu in changing the filename?

No concerns, just confirming what needs to be done as it will be a mass operation.

The only thing to confirm is which should be the prefix, Ubuntu-, UBUNTU-, UBTU-?

hogo6002 commented 3 months ago

The only thing to confirm is which should be the prefix, Ubuntu-, UBUNTU-, UBTU-?

Thanks for confirming! We prefer UBUNTU- to align with our current naming schema. Same as https://osv.dev/vulnerability/CURL-CVE-2024-7264

hogo6002 commented 3 months ago

Hey @dodys, I have two questions about Ubuntu CVE OSV records (ubuntu-security-notices/osv). First, I'm curious why some affected packages lack the ubuntu_priority field. This field is from the vulnerability level and should be consistent across all affected packages. But I've noticed that some packages are missing this field while others within the same vulnerability have it. Example: https://osv.dev/vulnerability/CVE-2024-21131

Second question is about the status field. I can't find it in the Ubuntu OSV records, not sure if it's an important one. But I'm wondering if there's a filtering mechanism that determines which records Ubuntu sends to OSV (like not sending all records with a status of DNE or not-affected ?).

dodys commented 3 months ago

Hey @dodys, I have two questions about Ubuntu CVE OSV records (ubuntu-security-notices/osv). First, I'm curious why some affected packages lack the ubuntu_priority field. This field is from the vulnerability level and should be consistent across all affected packages. But I've noticed that some packages are missing this field while others within the same vulnerability have it. Example: https://osv.dev/vulnerability/CVE-2024-21131

Indeed, for now we only added the ecosystem_specific part (including ubuntu_priority) when the CVE was fixed. In the next batch of update to our scripts, we plan to add the ecosystem_specific to all.

Second question is about the status field. I can't find it in the Ubuntu OSV records, not sure if it's an important one. But I'm wondering if there's a filtering mechanism that determines which records Ubuntu sends to OSV (like not sending all records with a status of DNE or not-affected ?).

Since OSV has the affected field, which is for only packages that are vulnerable to such "vulnerability" (CVE) or "set of vulnerabilities" (USN), for CVEs, we only list the ones that are affected or are still pending investigation. That means that the following status are always included: needed -> means vulnerable/affected needs-triage -> means that it needs investigation, therefore we consider it vulnerable until investigation is done deferred -> means that we tried to investigate but we still don't have enough information to do so, therefore it is vulnerable until confirmed pending -> means that the package is vulnerable and a fix is on its way released -> means that we patched it in such release

Some status are more complex and will depend on other factors: not-affected (<version>) -> if the <version> exists in that specific Ubuntu release, we consider it similar to released, if the <version> is an old one, e.g. the first version to contain the fix and not vulnerable anymore, then we don't add it since that Ubuntu release is not vulnerable at all. ignored -> it will only be included for releases that are still "alive", as ignored can be used for different reasons.

And some status we just don't include: DNE -> package doesn't exist in such Ubuntu release active -> even though it is in the docs we still don't use that status.

Does that help clarify?

hogo6002 commented 3 months ago

Does that help clarify?

Thanks Eduardo! It's very detailed and helpful!

Having all the ubuntu_priority info should be sufficient for us, especially considering you've already helped us filter out DNE and some not-affected ones.

dodys commented 3 months ago

I forgot to ask one thing Holly For the USN data, should we change the related field to refer to the UBUNTU-CVE-... or should we keep it as CVE-...?

hogo6002 commented 3 months ago

I forgot to ask one thing Holly For the USN data, should we change the related field to refer to the UBUNTU-CVE-... or should we keep it as CVE-...?

Are you referring to data from USN notes such as USN-6969-1? I think for this type of data, we can just keep its original USN- prefix, no change needed. Just for Ubuntu CVEs, we can add UBUNTU-CVE- as its prefix.

Misread the question (didn't see the related word :sweat_smile:), please refer to Oliver's answer

oliverchang commented 3 months ago

Hmm, I think @dodys is asking about what to put inside related for USN- entries? I think under our schema definitions, it would make the most sense to:

Put UBUNTU-CVE- in aliases, if it exists, given that they're both Ubuntu-specific advisories talking about the same vulnerability.
Put CVE- in related.

dodys commented 3 months ago

Thanks both!

dodys commented 3 months ago

Holly and Oliver, I've just pushed the CVE changes to our github repo, let me know if now looks as expected and the sync to osv.dev can be re-enabled

hogo6002 commented 3 months ago

Hey @dodys, thanks for the Ubuntu CVE updates! It looks good overall. One thing to mention is the aliases/related part.

I noticed that CVEs- are added as aliases to UBUNTU-CVE-, but Oliver previously mentioned we wanted to put UBUNTU-CVE- in USN- aliases. Doing both would make OSV treat the CVE- and USN- as aliases of each other because aliases are considered symmetric and transitive. This is not exactly what we want (see https://github.com/google/osv.dev/issues/2374#issuecomment-2216588085 for explanation). We're trying to avoid making Linux distro vulnerabilities aliases of CVEs.

In the current OSV USN- data, the aliases field is still empty, so it will not cause the problem. But putting CVE- in UBUNTU-CVE- aliases is not ideal because we think CVE- and UBUNTU-CVE- are not always the same thing and shouldn't be aliases to each other. ("Aliases should not be used to refer to vulnerabilities in packages upstream or downstream in a software supply chain from the given OSV record’s affected package(s).").

I talked with Oliver, and we think there is no perfect solution at this point. But to keep things consistent with other Linux distros, we think it's best to put all vulnerabilities in related instead of aliases. We might find a better way to show the relationship between Linux distro CVEs and CVEs later on.

With all that said, we don't need to change the current USN data as it doesn't contain any aliases. The only change we probably want is to move CVE- to the related field of UBUNTU-CVE-.

andrewpollock commented 3 months ago

The new UBUNTU-CVE prefix will also need to be added here and here in the OSV Schema.

dodys commented 3 months ago

ok, CVE files fixed and PR open: https://github.com/ossf/osv-schema/pull/265

Let me know if anything needed.

hogo6002 commented 2 months ago

ok, CVE files fixed and PR open: ossf/osv-schema#265

CVE files look good to me. Thanks for fixing them!

dodys commented 2 months ago

Thanks for verifying Holly! Let me know if you need anything else from my side. I believe for the sync to be re-enable is just a matter of addressing the ignore pattern: https://github.com/google/osv.dev/blob/master/source.yaml#L303C20-L303C36 But I will leave it to you if you want to do some testing before re-enabling it, so we don't create another issue for you.

hogo6002 commented 2 months ago

But I will leave it to you if you want to do some testing before re-enabling it, so we don't create another issue for you.

Hey Eduardo, we have re-enabled UBUNTU-CVE- back to OSV.dev, and the linux query timeout issue has been fixed. But I am also wondering why some CVEs don't have purl field while some have it?

dodys commented 2 months ago

But I will leave it to you if you want to do some testing before re-enabling it, so we don't create another issue for you.

Hey Eduardo, we have re-enabled UBUNTU-CVE- back to OSV.dev, and the linux query timeout issue has been fixed. But I am also wondering why some CVEs don't have purl field while some have it?

Hi Holly, sorry, but I am confused, both examples you shared have purl. Was it a wrong paste?

hogo6002 commented 2 months ago

Hi Holly, sorry, but I am confused, both examples you shared have purl. Was it a wrong paste?

Oh yeah, sorry. That was a wrong paste. Example without purl: https://github.com/canonical/ubuntu-security-notices/blob/main/osv/cve/2023/UBUNTU-CVE-2023-0120.json, https://github.com/canonical/ubuntu-security-notices/blob/main/osv/cve/2023/UBUNTU-CVE-2023-1296.json, https://osv.dev/vulnerability/UBUNTU-CVE-2023-46726, https://osv.dev/vulnerability/UBUNTU-CVE-2024-8354 (it seems all come from Ubuntu:Pro?)

dodys commented 2 months ago

Hi Holly, sorry, but I am confused, both examples you shared have purl. Was it a wrong paste?

Oh yeah, sorry. That was a wrong paste. Example without purl: https://github.com/canonical/ubuntu-security-notices/blob/main/osv/cve/2023/UBUNTU-CVE-2023-0120.json, https://github.com/canonical/ubuntu-security-notices/blob/main/osv/cve/2023/UBUNTU-CVE-2023-1296.json, https://osv.dev/vulnerability/UBUNTU-CVE-2023-46726, https://osv.dev/vulnerability/UBUNTU-CVE-2024-8354 (it seems all come from Ubuntu:Pro?)

We are only adding purl for vulnerabilities that were fixed. We could change that, if that's what is preferred :)

hogo6002 commented 2 months ago

We are only adding purl for vulnerabilities that were fixed. We could change that, if that's what is preferred :)

Yes, please. We also prefer the purl to not contain affected range information, which means it shouldn't include a version. Our purpose in using the purl field is to find the package itself. For example, in other ecosystems, you might see a purl like pkg:deb/debian/chromium?arch=source.

So for Ubuntu, regardless of whether a fix is available, the package purl should only contain the package information (distribution information is fine to keep).

Original one: pkg:deb/ubuntu/py7zr@0.11.3+dfsg-4ubuntu0.1?arch=src?distro=jammy Expected one: pkg:deb/ubuntu/py7zr?arch=src or pkg:deb/ubuntu/py7zr?arch=src?distro=jammy

@michaelkedar do you have anything to add?

dodys commented 2 months ago

We are only adding purl for vulnerabilities that were fixed. We could change that, if that's what is preferred :)

Yes, please. We also prefer the purl to not contain affected range information, which means it shouldn't include a version. Our purpose in using the purl field is to find the package itself. For example, in other ecosystems, you might see a purl like pkg:deb/debian/chromium?arch=source.

So for Ubuntu, regardless of whether a fix is available, the package purl should only contain the package information (distribution information is fine to keep).

Original one: pkg:deb/ubuntu/py7zr@0.11.3+dfsg-4ubuntu0.1?arch=src?distro=jammy Expected one: pkg:deb/ubuntu/py7zr?arch=src or pkg:deb/ubuntu/py7zr?arch=src?distro=jammy

@michaelkedar do you have anything to add?

Thanks Holly, I've made the changes and refreshed the data.

hogo6002 commented 2 months ago

Thanks Holly, I've made the changes and refreshed the data.

Thanks Eduardo! It all looks good now!

github-actions[bot] commented 3 days ago

This issue has not had any activity for 60 days and will be automatically closed in two weeks

See https://github.com/google/osv.dev/blob/master/CONTRIBUTING.md for how to contribute a PR if you're interested in helping out.

andrewpollock commented 3 days ago

If I understand correctly, in order to call this issue "done", we'd have to render combine-to-osv obsolete, which would mean:

Debian Security Tracker converted records have their own prefix (and are ideally published directly by Debian)
Alpine converted records have their own prefix (and are ideally published directly by Alpine
NVD converted CVE records are published natively (which nvd-cve-osv supports doing today)

google / osv.dev

Improve CVE entry management in OSV #2465