ossf / osv-schema

Open Source Vulnerability schema.
https://ossf.github.io/osv-schema/
Apache License 2.0
174 stars 71 forks source link

Tooling for schema validation #90

Open andrewpollock opened 1 year ago

andrewpollock commented 1 year ago

As part of #761, we became aware that the Cloud Security Alliance has a schema validator.

It seems like shipping a canonical, authoritative validator tool and library with the schema would be best, rather than each ecosystem integrator needing to reinvent the wheel (and possibly less comprehensively than desired).

marco-silva0000 commented 1 year ago

I would love a canonical mypy schema object that could be imported and used in any python tooling, same for other languages.

kurtseifried commented 1 year ago

Some comments (I'm the tool author):

We can't do this with a "pure" JSON schema because (this is the shortlist):

  1. OSV only requires id and modified, so a vuln entry could basically be blank and this is "correct", I suspect the GSD might end up with a minor fork of OSV that basically just has an expanded set of required fields
  2. CVE v4 has 3 schemas, one each for REJECT, RESERVED, and PUBLIC, so you need to read the state tag to know which schema to apply, CVE v5 fixes this with a single schema, you could use the JSON "one of" but it still might be incorrect
  3. there are many projects using OSV, but not JSON to host the data, instead, they use YAML so either you convert the YAML to JSON, or you need a programmatic validator that can read a YAML string
  4. GSD needs to implement schema validators for each data source we ingest anyways, for one simple reason: every data source we ingest typically has at least one or more errors, malformed CVE ID's, blank entries, etc. We find these and have had good luck so far getting upstreams (e.g. Mozilla, Mageia) to fix their data
  5. Long term the plan is to have our validator also correct the data we serve, e.g. vendor names of "[Red Hat]", "RedHat", all should be normalized to the correct "Red Hat"
kurtseifried commented 1 year ago

Also (obviously) we'll be releasing what we build as open source, in that directory long term so keep an eye and hopefully I'll get some more time to build this soon.

oliverchang commented 1 year ago

+1 that a pure JSON schema is not sufficient. There are other reasons:

marco-silva0000 commented 1 year ago

How about several different packages that all have a single responsibility?

On Wed, Mar 29, 2023, 10:53 PM Oliver Chang @.***> wrote:

+1 that a pure JSON schema is not sufficient. There are other reasons:

  • We need to validate package names, versions are valid.
  • We can check for e.g. schema_version being required when fields from a newer schema version is used.
  • And others that may overlap with @andrewpollock https://github.com/andrewpollock 's work on detecting invalid entries. Maybe some of that should be usable as a standalone tool.

— Reply to this email directly, view it on GitHub https://github.com/ossf/osv-schema/issues/90#issuecomment-1489378316, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA4UYQ436E6CT64IXUVKUVDW6SVOVANCNFSM6AAAAAARB2I54Q . You are receiving this because you commented.Message ID: @.***>

frasertweedale commented 1 year ago

Related: https://github.com/ossf/osv-schema/issues/168 - schema.json "score" pattern too strict in metric ordering, optional metrics not recognised