google / osv.dev

Open source vulnerability DB and triage service.
https://osv.dev
Apache License 2.0
1.54k stars 190 forks source link

Data quality issue with CVE-2024-38356 & CVE-2024-38357 #2358

Open tjdett opened 4 months ago

tjdett commented 4 months ago

The CVE ID

Two CVEs originating from GHSAs are affected by the same underlying issue:

Describe the data quality issue observed

Some vulnerable versions (including the most recent vulnerable versions of the library) are omitted from the affected versions list of the CVEs while being correct in the GHSAs.

Additional context

Both GHSAs cover vulnerabilities that were found across three supported versions of TinyMCE, with two open-source fixed versions available:

Affected versions: <5.11.0, >=6.0.0 <6.8.4, >=7.0.0 <7.2.0 Patched versions: 5.11.0, 6.8.4, 7.2.0

There are also additional downstream software packages mentioned in the GHSA where TinyMCE is directly embedded. The GHSA definitions on OSV.dev are fine, as they rely directly on GHSA-9hcv-j9pv-qmph.json & GHSA-w9jx-4g6g-rp7x.json.

The CVE affected versions on OSV.dev appear to use the Git commit URLs provided by GitHub on the GHSA. If this worked correctly, they would omit the fix in 5.11.0 from the definition (as 5.x is now patched only under a commercial license for long-term support customers) and additional downstream software packages, but would probably work for 6.x & 7.x.

Unfortunately, for some reason while two commit URLs are present in the GHSA data, only one commit URL appears in the references section on NVD:

In both cases the commit that appears is the one for 6.x, which results in 7.0.0 onwards not appearing as vulnerable versions even though they are explicitly mentioned in the CVE text and the associated GHSA.

Suggested changes to record

Ideally the CVE would mirror the version ranges specified in the GHSA, as the GHSA is the canonical source of the affected & fixed versions. The NVD record leaves little doubt this is the case by referring to "GitHub, Inc." as the source.

github-actions[bot] commented 4 months ago

:sparkles: Thank you for your interest in OSV.dev's data quality! :sparkles:

Please review our FAQ entry on how to most efficiently have this addressed.

andrewpollock commented 3 months ago

Ideally the CVE would mirror the version ranges specified in the GHSA, as the GHSA is the canonical source of the affected & fixed versions. The NVD record leaves little doubt this is the case by referring to "GitHub, Inc." as the source.

The reason why this isn't the case is because the CVE to OSV conversion focuses on deriving commit ranges, not versions, as the objective of converting these records is to facilitate vulnerability scanning by commit hash.

This FAQ entry discusses what OSV.dev with OSV records it imports:

Both version and commit enumeration populate the affected.versions[] field, which assists with precise version matching.

Any versions added to the affected[].versions array are Git tags that fall within the affected commit range and nothing is knowable about the versioning scheme employed.

The OSV records generated from the conversion do not contain any affected[].package information because:

a) there is no broadly applicable way to derive an ecosystem name for the subject of the CVEs converted b) as there is no ecosystem name, the package name is nonsensical

The CVE affected versions on OSV.dev appear to use the Git commit URLs provided by GitHub on the GHSA.

Close. Any commits in the CVE's references are assumed to be a more accurate fix commit that what may be derivable from any versions supplied in the CVE record.

Additionally, because https://services.nvd.nist.gov/rest/json/cves/2.0?cveId=CVE-2024-38356 and https://services.nvd.nist.gov/rest/json/cves/2.0?cveId=CVE-2024-38357 are yet to be analysed by the NVD, they are lacking any machine-readable details about the affected versions. Even if that was available, the commit in the references would be preferred.

If this worked correctly, they would omit the fix in 5.11.0 from the definition (as 5.x is now patched only under a commercial license for long-term support customers) and additional downstream software packages, but would probably work for 6.x & 7.x.

I think this is something of an edge case of an edge case, resulting in the combination of necessary assumptions we make for the scale we're operating at. I'm not sure that there's anything actionable that can be done here that would work at scale, and I'm also struggling to think of any enhancements to the underlying CVE that would have resulted in a more accurate conversion outcome.

/cc @oliverchang for his thoughts...

andrewpollock commented 1 week ago

@tjdett before I close this out as unactionable, I wanted to make sure we capture what use cases do and/or do not work with the data as it is today, as that may help review the situation from the perspective of the ultimate objective, which is vulnerability detection...

So, from the perspective of calling OSV.dev's API with either a package or a commit, can you give some examples of false negatives that occur with the data in its current form?

(It's best to work backwards from what is being scanned, rather than looking at specific records and trying to work forwards to what you have)