ossf / osv-schema

Open Source Vulnerability schema.
https://ossf.github.io/osv-schema/
Apache License 2.0
178 stars 77 forks source link

Overloading of ecosystem #83

Closed kurtseifried closed 1 month ago

kurtseifried commented 2 years ago

Where do we put the vendor name? There are lots of packages, sometimes in the same ecosystem, or not really in any ecosystem at all, for which the vendor name is really helpful.

The osv-data seems to do things like:

"ecosystem": "Debian:5.0",
"ecosystem": "Debian:10",

is this the official way to do it? If so can we update the documentation at https://ossf.github.io/osv-schema/#affected-fields doesn't mention this explicitly but does show examples, and:

Your ecosystem here. | Send us a PR.

if so (for the data I'm currently working with) there are about 50 vendors with 100+ items, 250 with 10-99 and 300 with 5-9. What's the bar for entry here to get listed? Do they need to be listed? (E.g. you have Debian so can we add all the major Linux vendors? BSD's?).

oliverchang commented 2 years ago

What's the bar for entry here to get listed? Do they need to be listed?

We just need clearly defined rules for each ecosystem. There must be no ambiguity as to what a "name" means in an ecosystem. This is not always obvious: e.g. for Debian, this must be source packages, not binary packages. For Python, the package name must be normalized. We can't just have e.g. ecosystem: "", name: "human readable text" as these are not very actionable.

Re Debian, the definition states:

The Debian package ecosystem; the name is the name of the source package. The ecosystem string might optionally have a :<RELEASE> suffix to scope the package to a particular Debian release. <RELEASE> is a numeric version specified in the [Debian distro-info-data](https://debian.pages.debian.net/distro-info-data/debian.csv). For example, the ecosystem string “Debian:7” refers to the Debian 7 (wheezy) release.
kurtseifried commented 2 years ago

Ok so where do we put software in general? e.g. OpenSource software that isn't part of an existing ecosystem goes where? Do we create something like "opensource" or "software"? What about closed source or vendor specific software?

There is already a catchall for stuff found by oss-fuzz

OSS-Fuzz For reports from the OSS-Fuzz project that have no more appropriate ecosystem; the name field is the name assigned by the OSS-Fuzz project, as recorded in the submitted fuzzing configuration.

Do we do something similar for data from other sources, e.g. "Other"?

kurtseifried commented 1 year ago

CVE JSON 5.90 is doing a similar thing to ecosystem with collectionURL https://github.com/CVEProject/cve-schema/blob/master/schema/v5.0/CVE_JSON_5.0_schema.json#L123:

           "collectionURL": {
                "description": "URL identifying a package collection (determines the meaning of packageName).",
                "$ref": "#/definitions/uriType",
                "examples": [
                    "https://access.redhat.com/downloads/content/package-browser",
                    "https://addons.mozilla.org",
                    "https://addons.thunderbird.net",

one major advantage of using a URL is now people know where to go look immediately, and there's no potential for overlap.

kurtseifried commented 1 year ago

I think we should consider using URLs pointing to the package ecosystem space for the ecosystem value as it ensures no duplicates, it gives people a hint where to go, and it makes adding new ones trivial, just use the best official URL you can find, there's less need to curate them manually.

oliverchang commented 1 year ago

The issue with that is that we still need clear defined rules for what a package name means as part of this ecosystem. There are subtleties in a lot of ecosystems. Some examples:

Having these clear sets of rules allow us to perform validation so that consumers can be confident about their ingestion.

kurtseifried commented 1 year ago

What happens if there is a flaw in the binary package and not the source? (this has happened a handful of times if my memory serves).

Also Debian vs https://packages.debian.org/, regardless of what it's called, it would be nice to be able to refer to both source and binary packages, is there some reason for not supporting referring to the binary packages? I assume the intent is to refer to the smallest component of the composition (e.g. go modules instead of packages) but why not support both?

andrewpollock commented 1 year ago

What happens if there is a flaw in the binary package and not the source? (this has happened a handful of times if my memory serves).

202 touches on this need...