ossf / osv-schema

Open Source Vulnerability schema.
https://ossf.github.io/osv-schema/
Apache License 2.0
181 stars 77 forks source link

Ecosystem support for tooling that is outside of the language or os specific package management system #94

Open westonsteimel opened 1 year ago

westonsteimel commented 1 year ago

It would be incredibly useful to have some standard way of referring to generic tooling that is part of a language or OS ecosystem, but not actually installed via that ecosystem's package registry.

An example might be cargo, which is a part of the rust ecosystem and has advisories issued from the Rust Advisory Database but is apart from the existing crates.io OSV ecosystem (some previous discussion on this)

Or a very recent example would be some way to refer to generic nodejs that hasn't been installed via a system package manager and would be separate from the existing npm OSV ecosystem, but still quite important to have some standard way of representing security advisories for. That would also hopefully open the door to flagging these tools within the existing GitHub Advisory Database for the languages they currently support

westonsteimel commented 1 year ago

My original thought was to potentially add a new ecosystem value for these, so could be something like:

Where the name would then be the name of the tool within that ecosystem, but that doesn't work particularly well for go since the existing package-centric OSV namespace is already called Go, and having both a language name and registry name ecosystem might be confusing for people anyways

joshbressers commented 1 year ago

I'm not entirely sure ecosystem makes sense here, this could be a new way to categorize software.

Let's use Node.js and OpenSSL as an example.

The Node.js binary, node, has OpenSSL statically linked in. If I install node, it's not part of an ecosystem or package manager. So now we have a binary file, that statically links in OpenSSL, that's not installed via any traceable system.

I think the value in the ecosystem tags is knowing where to go look for more details.

In the case of this OpenSSL, you can maybe go see if Node.js published anything. But for other things we have to try to track every possible binary vendor for details about whatever binaries they are building and distributing which is not realistic.

I think we need a tag that denotes this is a known problem, like tagging an OpenSSL vulnerability with node affects details, but also makes it clear this is not something we can easily programmatically determine. We maybe need humans to get involved to add and update the data.

westonsteimel commented 1 year ago

As a note I would expect this to start out very narrowly scoped to cover existing well-known tools that are important parts of language ecosystems and are not frequently installed via package managers

oliverchang commented 1 year ago

Hmm, this is certainly missing in the OSV schema, but I'm also a little wary of building something similar to CPEs, where we define our own custom registry of identifiers (as opposed to our ecosystem ones which just defer to that ecosystem).

Are there any other alternatives where we can be unambiguous and less bespoke? One possibility might be to use the source/repo path to do this. e.g.

There are probably other alternatives, but something like this would be much more deterministic and predictable as opposed to a custom dictionary.

westonsteimel commented 1 year ago

@oliverchang, yes I definitely agree on not maintaining our own registry of identifiers if we can possibly avoid it. I think the idea of using the source repo path could potentially work. What might the actual OSV entry look like in that case?

westonsteimel commented 1 year ago

Hmm, what about something specific for GitHub release artifacts? And maybe something general for source control URLs or just general URLs for published binaries?

oliverchang commented 1 year ago

@oliverchang, yes I definitely agree on not maintaining our own registry of identifiers if we can possibly avoid it. I think the idea of using the source repo path could potentially work. What might the actual OSV entry look like in that case?

Maybe something like:

{
  "type": "Program",
  "name": "https://github.com/python/cpython:Programs/python.c"
}

If we go with this there will need to be some rules around canonicalising git/repo URL, and a bit other details to figure out that I'm handwaving here.

GitHub release artifacts, general URLs could also work, but it may introduce more inconsistencies because there can be many different correct IDs if it's mirrored in a lot of places. Everything goes back to the source repo, so perhaps that would be more stable as an identifier.

captn3m0 commented 1 year ago

Or a very recent example would be some way to refer to generic nodejs that hasn't been installed via a system package manager

The PURL spec accounts for this via the generic type. Syft, for eg - already uses pkg:generic/node when it detects nodejs installed outside the system package manager.

Ref: https://github.com/anchore/syft/blob/bb6fc6525c6b791999a21d014b7557075202a2e8/syft/pkg/cataloger/binary/default_classifiers.go#L74-L82

OSV should still support this usecase, and maybe support a generic ecosystem. Or given that we already have a PURL, which should be resolvable to the relevant ecosystem anyway - why do we need a separate ecosystem field?

Edit: The source references are also supported via PURLs:

oliverchang commented 1 year ago

The problem with generic is that it's essentially a free-for-all that does not enforce any form of consistency. i.e. is pkg:generic/node the canonical Node, or is pkg:generic/Node.js or some other variation?

I think we need something more machine readable and consistent here. One possibility that can be made to be more consistent without us maintaining a custom registry (similar to CPEs) is something like https://github.com/ossf/osv-schema/issues/94#issuecomment-1299595670, but there are likely other approaches.

oliverchang commented 1 year ago

Thinking more here, everything really goes back to source here, and we can already encode vulns in things like language interpreters through git commits hashes and version tags. This gives us the most consistent way to describe vulns in open source software that don't have a canonical package ecosystem.

e.g. from https://github.com/google/oss-fuzz-vulns/blob/main/vulns/mruby/OSV-2020-744.yaml

id: OSV-2020-744
summary: Heap-double-free in mrb_default_allocf
details: ...
modified: '2022-04-13T03:04:39.780694Z'
published: '2020-07-04T00:00:01.948828Z'
references:
- type: REPORT
  url: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=23801
affected:
  ranges:
  - type: GIT
    repo: https://github.com/mruby/mruby
    events:
    - introduced: 9cdf439db52b66447b4e37c61179d54fad6c8f33
    - fixed: 97319697c8f9f6ff27b32589947e1918e3015503
  versions:
  - 2.1.2
  - 2.1.2-rc
  - 2.1.2-rc2
  ecosystem_specific:
    severity: HIGH

The entrypoint is missing here from my original proposal at https://github.com/ossf/osv-schema/issues/94#issuecomment-1301608831, but that was also flawed in that entrypoints can easily move across versions as part of refactoring.

dfandrich commented 1 year ago

The curl project is experimenting with publishing its security vulnerabilities in OSV (in curl/curl-www#237) and has hit this OSV limitation. Technically, OSV can't be used for this because ecosystem is mandatory and there's no appropriate value for upstream packages. Most PURL types can specify a unversioned package (e.g. pkg:deb/debian/curl) but the generic type cannot; it is tied to a download URL that must point to a specific version of a package.

Fixing PURL to allow an unambiguous, unversioned generic package type would be one way to fix this. e.g. pkg:generic/curl?package_url=https://curl.se/download/#curl

oliverchang commented 1 year ago

The curl project is experimenting with publishing its security vulnerabilities in OSV (in curl/curl-www#237) and has hit this OSV limitation. Technically, OSV can't be used for this because ecosystem is mandatory and there's no appropriate value for upstream packages. Most PURL types can specify a unversioned package (e.g. pkg:deb/debian/curl) but the generic type cannot; it is tied to a download URL that must point to a specific version of a package.

Fixing PURL to allow an unambiguous, unversioned generic package type would be one way to fix this. e.g. pkg:generic/curl?package_url=https://curl.se/download/#curl

That's awesome to hear! And sorry we don't have a clear story for this yet.

Would the approach of just including the source repository information (e.g. the example in https://github.com/ossf/osv-schema/issues/94#issuecomment-1486192372) without an ecosystem/package work for curl?

oliverchang commented 1 year ago

Here are some existing curl examples in OSV (based off OSS-Fuzz automation): https://github.com/google/oss-fuzz-vulns/blob/main/vulns/curl/OSV-2022-450.yaml, https://github.com/google/oss-fuzz-vulns/blob/main/vulns/curl/OSV-2022-141.yaml

dfandrich commented 1 year ago

Using a Github URL would work to disambiguate this curl from any others, but there's a technicality: the curl releases are almost-but-not-quite what's tagged in git, so doing so would leave the wrong impression. The git sources are the basis of a release, but then things like autoreconf are run to get automake files, man pages are prebuilt, etc. and that final result ends up as the curl release tarball. That's also why we can't really use a Github PURL like pkg:github/curl/curl@curl-8_0_1 to talk about a curl release since a release is more than just those tagged files. An autoconf security bug that necessitates a new curl point release could (theoretically) use exactly the same tagged git files yet not contain the security bug.

The OSS-Fuzz example is a bit unfair because OSS-Fuzz gets its own ecosystem and can do whatever it wants with it, in this case defining "curl" conveniently as our project. The OSV docs also say that the OSS-Fuzz ecosystem is only to be used for bugs related to OSS-Fuzz findings, so we can't use it. It's interesting that they're actually using the almost-useless too-generic PURL pkg:generic/curl as curl also has ended up doing due to nothing better being available.

oliverchang commented 1 year ago

Using a Github URL would work to disambiguate this curl from any others, but there's a technicality: the curl releases are almost-but-not-quite what's tagged in git, so doing so would leave the wrong impression. The git sources are the basis of a release, but then things like autoreconf are run to get automake files, man pages are prebuilt, etc. and that final result ends up as the curl release tarball. That's also why we can't really use a Github PURL like pkg:github/curl/curl@curl-8_0_1 to talk about a curl release since a release is more than just those tagged files. An autoconf security bug that necessitates a new curl point release could (theoretically) use exactly the same tagged git files yet not contain the security bug.

Ah, that's a very interesting point that I don't have a good answer for.

If it's available though, I believe the git metadata would still be useful in most cases though to consumers though, for people who are pulling curl by source (e.g. as a submodule to use as a library) and as a fallback identification mechanism that works in most cases. This enables them to make use of this vulnerability feed in an automated way just by looking at their git hashes.

The OSS-Fuzz example is a bit unfair because OSS-Fuzz gets its own ecosystem and can do whatever it wants with it, in this case defining "curl" conveniently as our project. The OSV docs also say that the OSS-Fuzz ecosystem is only to be used for bugs related to OSS-Fuzz findings, so we can't use it. It's interesting that they're actually using the almost-useless too-generic PURL pkg:generic/curl as curl also has ended up doing due to nothing better being available.

Yeah the PURL really is a best effort at that point as a hint in this case. Even pkg:generic/curl?package_url=https://curl.se/download/#curl seems hard to maintain consistency around with the URL formatting, and across the open source ecosystem with other projects. We've tried to avoid adding a similar "Generic", as such fields are hard to automate on and maintain consistency.

Instead, how about we define a "Curl" ecosystem in the OSV spec? That way we can define the naming and the version rules very precisely and remove any ambiguity.

bagder commented 1 year ago

Instead, how about we define a "Curl" ecosystem in the OSV spec?

I think that would be a rather poor fix.

What if we next want to provide JSON objects for flaws from @libssh2 or @c-ares etc? Should they too get new imaginary ecosystems? These projects are not "ecosystems", they are stand-alone tools/libraries.

bagder commented 1 year ago

In the curl project we now provide JSON objects according to this schema for all published CVEs. 141 of them at today's count.

We can however not identify the project in the JSON objects because curl is not part of any valid "ecosystem". I assume this might be problematic for some users of this data.

sethmlarson commented 1 year ago

Noting here that we're running into the same problem for projects like CPython, there is no ecosystem value for OSV that matches PURL's "generic" ecosystem.

oliverchang commented 1 year ago

Noting here that we're running into the same problem for projects like CPython, there is no ecosystem value for OSV that matches PURL's "generic" ecosystem.

Would something like the suggestions in https://github.com/ossf/osv-schema/issues/94#issuecomment-1486192372 or https://github.com/ossf/osv-schema/issues/94#issuecomment-1301608831 work for the CPython use case?

We need a well defined namespace for describing non-package-manager ecosystems and the versions associated with them. The problem with "generic" is it offers little consistency nor automatability for consumers, which is what OSV has tried to fix.

sethmlarson commented 1 year ago

@oliverchang Thanks for the suggestions! I believe https://github.com/ossf/osv-schema/issues/94#issuecomment-1486192372 would work for CPython's use-case if I'm reading it correctly, essentially omitting the affected.package key altogether and use only ranges and versions (I'm also assuming that ranges of type ECOSYSTEM continue to work)

The OSV database structure I'm planning already separates OSV documents (is that the right word for them?) into separate directories depending on the project, so advisories/python/CVE-YYYY-NNNN.json and then the content of the file wouldn't need to have an identifier putting the advisory as one for Python?

Will this structure and omission of affected.package play nicely with the OSV database?

nisamson commented 1 month ago

While https://github.com/ossf/osv-schema/issues/94#issuecomment-1640134859 mentions that it's not a goal to replicate the CPE format, it would be very useful to include some information included in CPEs. For example, certain vulnerabilities may only affect particular platforms (e.g. Windows), and this information is included in the platform field of CPEs that NVD includes with their data.

This is especially relevant to generic packages, where the package may be used on any platform, but is also relevant to ecosystem-specific packages. As an example, this Python package vulnerability (GHSA-rxff-vr5r-8cj5) only affects Windows versions of the package. There is no standard way to include that information in the OSV JSON, and so far as I'm aware there is no standard mechanism for including such information in the PURL.

oliverchang commented 1 month ago

While #94 (comment) mentions that it's not a goal to replicate the CPE format, it would be very useful to include some information included in CPEs. For example, certain vulnerabilities may only affect particular platforms (e.g. Windows), and this information is included in the platform field of CPEs that NVD includes with their data.

This is especially relevant to generic packages, where the package may be used on any platform, but is also relevant to ecosystem-specific packages. As an example, this Python package vulnerability (GHSA-rxff-vr5r-8cj5) only affects Windows versions of the package. There is no standard way to include that information in the OSV JSON, and so far as I'm aware there is no standard mechanism for including such information in the PURL.

Thanks for the feedback. Regarding specific OSes, I've created https://github.com/ossf/osv-schema/issues/281