usnistgov / OSCAL

Open Security Controls Assessment Language (OSCAL)
https://pages.nist.gov/OSCAL/
Other
670 stars 181 forks source link

JSON Schema base URI collision #902

Closed davaya closed 3 years ago

davaya commented 3 years ago

Describe the bug

The JSON Schema files for multiple layers all have the same base URI (value of the root $id keyword). This indicates a bug in the schema generation tools, since the URI is intended to (uniquely) identify schema resources.

How do we replicate the issue?

Examine JSON schemas in:

Observe that they begin with:

 { "$schema" : "http://json-schema.org/draft-07/schema#",
  "$id" : "http://csrc.nist.gov/ns/oscal/1.0-schema.json",
  "$comment" : "OSCAL Control Catalog Model: JSON Schema",

{ "$schema" : "http://json-schema.org/draft-07/schema#",
  "$id" : "http://csrc.nist.gov/ns/oscal/1.0-schema.json",
  "$comment" : "OSCAL Profile Model: JSON Schema",

{ "$schema" : "http://json-schema.org/draft-07/schema#",
  "$id" : "http://csrc.nist.gov/ns/oscal/1.0-schema.json",
  "$comment" : "OSCAL Component Definition Model: JSON Schema",

Expected behavior (i.e. solution)

The base URI of each distinct schema should identify no other schema. For example:

"$id" : "http://csrc.nist.gov/ns/oscal/1.0-schema/catalog.json"

"$id" : "http://csrc.nist.gov/ns/oscal/1.0-schema/profile.json",

"$id" : "http://csrc.nist.gov/ns/oscal/1.0-schema/component.json",
david-waltermire commented 3 years ago

We are working to produce a single JSON and XML schema for all of OSCAL. This will address this issue once deployed.

davaya commented 3 years ago

At first glance restructuring OSCAL from modular to monolithic seems like a step in the wrong direction. Loose coupling using namespaces would be the natural approach - is there a rationale or pros and cons for using a monolithic JSON and XML schema for all of OSCAL?

GaryGapinski commented 3 years ago

I thought there was an aversion to using more than one namespace (if that is what @davaya means — i.e., one namespace per sub-schema).¹

At the moment, there are multiple OSCAL schemas each within the same namespace, which makes

thus requiring the use of explicit schema association per instance document using

¹ OVAL made profligate use of namespaces which IMO markedly decreased its usability by increasing its complexity.

davaya commented 3 years ago

BLUF: Schema namespaces yes. Data namespaces no.

After reading the OSCAL metaschema paper https://www.balisage.net/Proceedings/vol23/print/Piez01/BalisageVol23-Piez01.html the motivation for a single namespace becomes clearer. But when discussing the "OSCALizable subset of XML", the distinction between schema and data namespaces is, or appears to be, lost.

JSON data has no namespaces but JSON Schema does - the root $id of each schema file gives that file's namespace. Namespacing enables reuse of definitions - there's no need for OSCAL to re-invent SI units for length, mass and temperature, no need to re-invent GPS coordinates, etc. Those types can be created by experts and referenced when needed. JSON schema facilitates cross-namespace referencing using $ref, but the resulting data has no trace of namespacing because the data format explicitly does not support it.

I think it would be appropriate for each of the OSCAL schema/model layers to have its own namespace - there is no danger of namespace proliferation because the number of layers might grow from 7 to 8 or 9, but not to thousands. It might also be appropriate for the OSCAL XML data to emulate JSON data and be constructed without element prefixes. Data structure provides namespace separation the way filesystem paths ensure that there is no collision between files of the same name in different folders:

<markup>
    <table>
        <head/>
        <body/>
    </table>
</markup>

{
  "markup": {
    "table": {
      "head": [],
      "body": []
}}}

is not confused with:

<furniture>
   <table>
       <material/>
       <weight/>
   </table>
</furniture>

{
  "furniture": {
    "table": {
      "material": "oak",
      "weight": 52
}}}
david-waltermire commented 3 years ago

@davaya A JSON schema does not have a namespace. It has a unique schema identifier expressed as a canonical URI. This is not the same as a namespace.

FWIW, we made an early decision that all of OSCAL will be in the same XML namespace, which I think at this point we need to keep for OSCAL v1. This allows us to reuse common information items across the OSCAL models (and schemas). Since in OSCAL XML all information items are in the same namespace, we can avoid having to alternate namespaces, which has been a confusing and problematic issue for users of other efforts that do this (i.e., OVAL, etc.).

wendellpiez commented 3 years ago

Additional note: the Metaschema back end gives us a great deal of flexibility in this, for generating schemas (both XML and JSON) with specialized namespaces as well as with unified namespaces when/as appropriate. I am not sure everyone will regard this approach as a solution so much as (again) moving the problem. But it might offer options going forward.

davaya commented 3 years ago

@david-waltermire-nist: "A Package is a namespace for its members, which comprise those elements associated via packagedElement (which are said to be owned or contained), and those imported." -- https://www.omg.org/spec/UML/2.5.1/PDF Section 12.2.3.1

"Namespace is an abstract named element that contains (or owns) a set of named elements that can be identified by name. In other words, namespace is a container for named elements." -- https://www.uml-diagrams.org/namespace.html#:~:text=UML%20Common%20Structure,package

"When writing computer programs of even moderate complexity, it’s commonly accepted that “structuring” the program into reusable functions is better than copying-and-pasting duplicate bits of code everywhere they are used." -- https://json-schema.org/understanding-json-schema/structuring.html

A JSON schema file with a root $id acts like a package with a namespace and is used like a namespace, so if there is some terminological technicality that says it is not, the distinction will have to be articulated with much greater precision.

How to structure OSCAL is a design decision, and using a single namespace is certainly a valid option. It does require close coupling between the layers, and since they were apparently developed assuming loose coupling, any name collisions will need to be resolved before the single namespace can be realized. That's easily doable, but I would have favored loose coupling.

Cheers.

david-waltermire commented 3 years ago

All name collisions within the OSCAL domain are handled by the Metaschema XML and JSON schema processing. The draft JSON and XML schemas produced should not have naming collisions.

FYI. The JSON definition IDs used in the "complete" schema are the same JSON definition ids used in each "model" schema. The same applies for XML types used in the "complete" vs "model" schemas. This allows the common information items to be easily identified.

david-waltermire commented 3 years ago

The "complete" XML and JSON schemas have been integrated in PR #948. These will be released in OSCAL 1.0.0.