usnistgov / OSCAL

Open Security Controls Assessment Language (OSCAL)
https://pages.nist.gov/OSCAL/
Other
666 stars 180 forks source link

JSON string vs markup-line or markup-multiline #2007

Open swanky-oscal opened 5 months ago

swanky-oscal commented 5 months ago

Question

It appears that all instances of markup-line and markup-multiline in the JSON model are converted to strings in the schema.

For instance the metadata title element references markup-line in the v1.1.2 reference model.
And the JSON metaschema defines MarkupLineDatatype. However complete-schema.json in the v1.1.2 release defines #assembly_metadata:title as a String.

So, my question is two part:

  1. Why does the published JSON schema replace markup-line and markup-multiline with String?
  2. Should this be clearly explained in the guidance? (And apologies if it is. I just can't find it)?
iMichaela commented 5 months ago

@swanky-oscal - JSON was not designed for human readability and string was the best approach our team came up with. One can craft the string in such a way that accomplishes their expected output. If you have a proposal for a better representation, we are listening. The reason for not explicitly documenting it was our wrong assumption that all our OSCAL developers working with JSON will understand the reason for this choice.

swanky-oscal commented 5 months ago

@iMichaela, thanks for your response! I'm thinking of this from a strongly typed schema perspective. Many of the metaschema types are represented in JSON as string. But being able to type the string as a date-time vs base64 vs markup-line vs string is useful for validation.

I am developing a Rust based OSCAL schema lib. So, strong typing is pretty core. The standard JSON serialization crate (serde, serde_json) is very well suited for managing the validation at a schema level. So I will just type the appropriate metaschema elements as MarkupLineDatatype and MarkupMultilineDatatype anyways. But it would be very cool if complete-schema.json helped me with this.

Incidentally, I wrote a simple code generator that generates Rust code from complete-schema.json. Very rudimentary. But it helps to show how I could take advantage of markup-multiline being included in the metaschema data types.

iMichaela commented 5 months ago

@swanky-oscal - Very exciting to learn of your effort. Please note that JSON schema is also short of documenting current constraints. I hope you are also aware of @gborough 's work: ROSCAL :) See https://github.com/usnistgov/OSCAL/discussions/1986

wendellpiez commented 5 months ago

@swanky-oscal this scrutiny is actually timely. Over in another repo we are looking at a different JSON Schema bug. If there is an obvious enhancement to the schema suggested by your observation, we could fold that in with the repair.

Until we look harder I guess we won't know, but suggestions are welcome (the more concrete the better).

Of course the fundamental problem is that the "Markdown datatypes" (markup-line and markup-multiine) -- or so they are in the JSON -- are not exactly easy either to specify or to validate (even if lexically). Where we are stuck, essentially we try to follow the 'do no harm' rule and pass the problem along. This doesn't mean we can't do better (either validating the data, or simply providing hooks) and community suggestions have already led to improvements in this schema. (Same goes for other artifacts of course.)

If we can take this to https://github.com/usnistgov/metaschema-xslt/issues/105 (or to https://github.com/usnistgov/metaschema-xslt/pulls/108, addressing it) perhaps we could close it here?

wendellpiez commented 5 months ago

Having looked again I am not sure the root of the problem is simply that JSON Schema syntax doesn't give us enough 'play' to express what we want. We want to describe the field being defined, but it also may have a datatype (whether a markup type or other) that reduces to a string (if that's the most the JSON Schema can say) or is otherwise 'bobbled'.

One workaround would be to extend the use of description to include the designated type.

So

"title" : 
     { "title" : "Part Title",
      "description" : "An optional name given to the part, which may be used by a tool for display and navigation. [MarkupLineDatatype]",
      "type" : "string" },

Then at least the info would be there in an annotation where it could be found.

I am looking at https://json-schema.org/draft-07/draft-handrews-json-schema-validation-01 for JSON Schema vocabulary.

wendellpiez commented 5 months ago

Actually let me retract that ... there is indeed something more going on here. inasmuch as although some data types are handled by the defined types (StringDataType), others (notably MarkupLineDatatype) appear to be falling through a crack.

Thanks @swanky-oscal for picking this up -- it can take some digging to see and almost certainly an improvement to be made to the schema generation --

wendellpiez commented 3 months ago

A correction is on line in a working branch and lightly tested. So this bug should go away.