spdx / spdx-3-model

The model for the information captured in SPDX version 3 standard.
https://spdx.dev/use/specifications/
Other
70 stars 45 forks source link

Consider JSON Schema in RDF #438

Open aamedina opened 1 year ago

aamedina commented 1 year ago

https://www.w3.org/2019/wot/json-schema

This is essentially the same model used for the “Web of Things” W3C spec so it would naturally extend SBOM support to IoT devices expressing their affordances and descriptions using JSON Schema.

The vocabulary can be used to explicitly map the JSON property names to strings and make validation and conformance easier for consumers who could use the authoritative JSON schema derived from the standardized RDF model.

I have experimentally done this for a couple of published JSON schemas as RDF which you can review here.

Note: I did add an additionalProperties and uniqueItems term to the original model.

https://github.com/aamedina/openai/blob/main/resources/openai.ttl

https://github.com/aamedina/qdrant/blob/main/resources/qdrant.ttl

aamedina commented 1 year ago

Note: I think this should be provided as a separate ontology referencing and extending the SPDX ontology with new classes when imported.

“spdx-schema” for the JSON Schema RDF model

“spdx-shapes” for the SHACL shapes

both of which could import “spdx” which contains classes, properties, and individuals

goneall commented 1 year ago

Thanks @aamedina for the references. I was not aware of this effort. Within the SPDX community, we have strong support for JSON schemas and RDF - so this seems very applicable.

@sbarnum and @zvr - any thoughts?

aamedina commented 1 year ago

https://github.com/spdx/spdx-3-model/issues/460 related: I think SHACL can be used to infer the JSON Schema RDF properties. But it may be simpler to just do that in the spec-parser like you're already doing. I will have to experiment later to think through this with the SPDX 3 model, but I need to learn the rest of the profiles first.

davaya commented 1 year ago

Thanks @aamedina. This is also related to #464. JSON Schema can be used to define datatypes like PositiveIntegerRange and Hash so that they are an integral part of the RDF model, not just serializations.

ABNF, regular expressions, and JSON Schema can all be used to define datatypes. The JSON Schema

{
  "type": "object",
  "properties": {
    "algorithm": {"enum": ["sha1", "sha256"]},
    "value": {"type": "number"}
  },
  "required": ["algorithm", "value"],
  "additionalProperties": false
}

defines a Hash datatype, where an instance is the string '{"algorithm": "sha1", "value": 12345}' that is valid in non-JSON serializations as well as JSON.

I don't know if JSON Schema has a standard way of defining named arrays (e.g., namedtuple) where the same Hash value would be serialized as ["sha1", 12345], but it's trivial with regex:

^(?<algorithm>(?:sha1|sha256)):\s*(?<value>\d+)$

The logical value of a Hash datatype instance:

  algorithm: sha1
  value: 12345

is serialized as the lexical string sha1: 12345

goneall commented 7 months ago

I believe this can be added without any breaking changes - moving to 3.1 milestone

cc: @JPEWdev @zvr