usnistgov / OSCAL

Open Security Controls Assessment Language (OSCAL)
https://pages.nist.gov/OSCAL/
Other
658 stars 179 forks source link

Automated OSCAL specification translations and mappings from XML schema to JSON schema #125

Closed anweiss closed 6 years ago

anweiss commented 6 years ago

User Story:

As an OSCAL specification maintainer, I want to have a development pipeline that allows for seamless modeling interoperability between the XML schema and JSON schema.

Goals:

Dependencies:

Requires a solidified set of XML schemas per Issue #124

Acceptance Criteria

Both the XML schemas and JSON schemas are in-sync and require a minimal number of manual modifications

Engineering Notes:

Needs:

  1. Generate documented XSD for all OSCAL models
  2. Generate JSON Schema for all OSCAL models
  3. Generate pages.nist.gov/OSCAL XML and JSON documentation describing the content models, data typing, cardinalities, and validation constraints

Commandments:

  1. Avoid hand creation and maintenance of schema models (i.e., XML, JSON) at ALL COSTS!
  2. Want to maintain all models from a single point of ground truth.
  3. Published documentation must be constantly in sync with published models

Requirements:

What we will do: 1) (Andrew) Will provide an initial JSON Schema based on data provided by Wendell. 2) (Wendell) Create a proof-of-concept metamodel that can generate a rudimentary XSD and JSON Schema. 3) (Wendell) By the end of sprint 10, we will have:

akarmel commented 6 years ago

3/15/2018 - Sprint 9 Progress Notes

akarmel commented 6 years ago

3/22/2018 - Sprint 9 Progress Notes

akarmel commented 6 years ago

3/29/2018 - Sprint 9 Progress Notes

anweiss commented 6 years ago

btw, JSON schema draft-07 now includes support for validation against values containing a specific string encoding (e.g. "text/html") -> http://json-schema.org/latest/json-schema-validation.html#rfc.section.8. This could be helpful for prose or any other non-JSON data encoded as a JSON string.

wendellpiez commented 6 years ago

That is definitely interesting. At the same time, "prose" in OSCAL is only HTML-like, it's not HTML. Most importantly it includes the OSCAL elements insert and withdrawn. That being said, an HTML validator might nonetheless return "okay" on OSCAL prose, since the HTML rule is to ignore unknown elements. Or there could be a setting to permit it to do so. (Trials are required!)

What would be super interesting would be if you could bind JSON strings for validation against XML document types (however defined and specified). Since the prose in OSCAL JSON amounts to islands of XML (at least until we define some other notation to cast it into).

If a JSON validation engine can use an HTML parser to validate strings-as-HTML, can it do the same with an XML parser/validator?

anweiss commented 6 years ago

The JSON schema by itself doesn't allow for direct validation against XML document types. The only ways to do this is are either to identify the most closely-equivalent JSON schema types for the corresponding XML schema document types or simply set every value as a "string" type and encode the corresponding XML as a string. While there certainly isn't anything preventing one from encoding XML as a JSON string, I don't really think this is done much, if at all, in practice and kind of defeats the purpose of JSON schema types.

Furthermore, the contentMediaType keyword defined in JSON schema draft-07 is merely an annotation that gives an external validator a heads up that a particular field contains content of a type defined in IETF RFC 2046.

wendellpiez commented 6 years ago

@anweiss yes, that contentMediaType is effectively what I was asking about. That would be the hook.

I assume we are still talking about validating strings that are embedded in JSON, not validating JSON syntax notation (or, when parsed, structures represented as JSON objects) against constraints declared over XML (which doesn't make sense to me). AFAIK, we are emitting such strings into the JSON with markup now (as noted it is XML using HTML tag names and, mostly, semantics) not because that's such a good idea but because we have not decided to throw that information away (by stripping the markup or rendering it in some way -- we need to be able to get it back), and we don't have an obviously better way to represent it (since b/c this is mixed content, there is no defined way to "shred" it.)

Indeed burying markup into JSON would tend to defeat the purpose of JSON (and of XML as well) -- that the data be well-defined and addressable. (This is true irrespective of whether one validates the strings against the rules of HTML or any favorite flavor of XML.) Maybe we can improve on this strategy. It would be nice to know what control AC-1 would optimally look like in JSON with no embedded markup (or maybe AC-2, which has control enhancements), such that the JSON is functionally equivalent (semantically maps) to either its NVD XML or OSCAL representations.

anweiss commented 6 years ago

Correct, everything is being encoded as-is as a JSON string, at least until we figure out exactly what we want to retain.

IMO, optimal JSON-equivalent prose for a control is really just raw description text and sub-text (without any markup) that includes referencable IDs that map to the ordered list description items from the original control. In addition, any prose that is defined in the declarations model and that has a meaningful action associated with it and not markup (e.g. the "decision" class in <p class="decision">develops and documents an access control policy that addresses:</p>), could just be flattened into a separate JSON property (e.g. {"decision": "develops and documents an access control policy that addresses"})

wendellpiez commented 6 years ago

If that p[@class='decision'] were a prop not a p, that would help a lot - but prop elements can't have mixed content (in this case that would be not only inline formatting but possibly parameter insertions). This might work - I have not queried the doc to see whether any mixed content actually appears in there, in sp800-53 decision elements. If not, that could be a solution.

However, this question also raises the other big XML-JSON mapping question, namely ordering of contents within controls. While it is fair to say that properties and parts might be re-ordered without degrading their semantics (an "objectives" part followed by a "guidance" part will mean the same thing if the "guidance" comes first), this isn't the case with paragraphs appearing in running prose ...

akarmel commented 6 years ago

4/5/2018 - Sprint 9 Progress Notes

akarmel commented 6 years ago

4/12/2018 - Sprint 9 Acceptance

david-waltermire commented 6 years ago

@wendellpiez @anweiss It would be good for you two to come up with an abstraction approach that will allow for maintenance of the XML and JSON models using the same ground truth (goal 1). Can you update me towards the end of the week/early next on your thoughts?

anweiss commented 6 years ago

Some high-level notes:

hierarchical commonalities:

structured chunks of prose = ways to model loose stuff such that it "isn't so loose anymore"

Since the declarations model is the beginning of a sort of meta-schema, we'll use this as a starting point for identifying the commonalities and for flattening the JSON model.

Some "low-hanging fruit" for flattening JSON model and for moving up into their own property names:

On the flip side, flattening introduces a more dynamic model that could be a bit more difficult for end users or tools to parse

david-waltermire commented 6 years ago

"controls/subcontrols are essentially the same thing" Their content models are almost the same, but a subcontrol requires a parent control as the major difference. I would expect the JSON model to reflect this.

anweiss commented 6 years ago

Also as a reference, I've been using the Google JSON Style Guide in developing JSON-formatted OSCAL

anweiss commented 6 years ago

A more tangible thought:

<prop class="name">AC-1</prop>
<prop class="priority">P1</prop>
<prop class="baseline-impact">LOW</prop>
<prop class="baseline-impact">MODERATE</prop>
<prop class="baseline-impact">HIGH</prop>

becomes ...

"props": [
  {
    "name": "AC-1",
    "priority": "P1",
    "baselineImpacts": [ "LOW", "MODERATE", "HIGH" ]
  }
]

This would make it easier for JSON artifact consumers to interpret the fields. It also allows for the JSON schemas to be more dynamic and that which can be be validated against meta-schemas defined by "declarations" models.

The 1:1 mapping we have today is not nearly as concise and isn't as intuitive for consumers to parse:

"props": [
  {
    "class": "name",
    "value": "AC-1"
  },
  {
    "class": "priority",
    "value": "P1"
  },
  {
    "class": "baseline-impact",
    "value": "LOW"
  },
  {
    "class": "baseline-impact",
    "value": "MODERATE"
  },
  {
    "class": "baseline-impact",
    "value": "HIGH"
  }
]
wendellpiez commented 6 years ago

@anweiss, I'd like to think about going further and leaving out the wrapper for 'props':

"name": "AC-1","priority": "P1","baselineImpacts": [ "LOW", "MODERATE", "HIGH" ]

I believe this is possible iff property names are known in advance, which ones are singletons are known in advance, and they have unique names that do not clash with 'parts', 'prose' or any other of our JSON-magic words.

I think something similar might be possible with parameters. (But I am writing about this now in another window).

What do you think? From a schema development POV, the question is, can we validate it? (I think we can.) One downside is that the JSON for SP800-53 will look much different from the JSON for ISO27002, unless they were mapped together up front.

On Wed, Apr 25, 2018 at 12:19 PM, Andrew Weiss notifications@github.com wrote:

A more tangible thought:

  • Prop class attributes can be flattened into JSON property names and the prop element itself becomes the JSON property value. Singletons translate into single values and non-singletons become arrays. For example:
AC-1 P1 LOW MODERATE HIGH

becomes ...

"props": [ { "name": "AC-1", "priority": "P1", "baselineImpacts": [ "LOW", "MODERATE", "HIGH" ] } ]

This would make it easier for JSON artifact consumers to interpret the fields. It also allows for the JSON schemas to be more dynamic and that which can be be validated against meta-schemas defined by "declarations" models.

The 1:1 mapping we have today is not nearly as concise and isn't as intuitive for consumers to parse:

"props": [ { "class": "name", "value": "AC-1" }, { "class": "priority", "value": "P1" }, { "class": "baseline-impact", "value": "LOW" }, { "class": "baseline-impact", "value": "MODERATE" }, { "class": "baseline-impact", "value": "HIGH" } ]

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/usnistgov/OSCAL/issues/125#issuecomment-384346211, or mute the thread https://github.com/notifications/unsubscribe-auth/ABg8BAfmXrHCR3SC0E-ltiuOZXOH8dyJks5tsKH1gaJpZM4Sk7uk .

-- Wendell Piez | http://www.wendellpiez.com XML | XSLT | electronic publishing Eat Your Vegetables oo____o_o_ooooo__ooooooo_^

trevor-vaughan commented 6 years ago

This is an interesting concept, but aren't you setting yourself up for a large amount of maintenance in reverse translation?

Each standard will need a set of translation functions and I don't feel like that is sustainable in the long term.

For revision 1, I would probably just do the simplest thing possible and see how users react. I, for one, would have no issue writing a select statement to pull what I needed out of the JSON even if it's a bit irritating. I'd probably just write a library to handle those bits and publish it at some point.

anweiss commented 6 years ago

@wendellpiez yea, we could certainly take it further and to the same with parameters. From a schema POV, the propertyNames validation keyword can be used for this and can cross-reference another schema as needed.

@trevor-vaughan good point indeed. This would potentially make roundtripping a bit trickier. And to your point re. "each standard will need a set of translation functions ...", this is kind of already the case today to a certain extent with the current "declarations" model

wendellpiez commented 6 years ago

I agree, if we can't effectively automate the translation back again (perhaps by means of an OSCAL-declarations-based mechanism, as @anweiss and I have discussed), this is not worth doing.

It's certainly valuable to know that a developer doesn't find that particular improvement to be compelling! :-)

On Wed, Apr 25, 2018 at 12:37 PM, Trevor Vaughan notifications@github.com wrote:

This is an interesting concept, but aren't you setting yourself up for a large amount of maintenance in reverse translation?

Each standard will need a set of translation functions and I don't feel like that is sustainable in the long term.

For revision 1, I would probably just do the simplest thing possible and see how users react. I, for one, would have no issue writing a select statement to pull what I needed out of the JSON even if it's a bit irritating. I'd probably just write a library to handle those bits and publish it at some point.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/usnistgov/OSCAL/issues/125#issuecomment-384352071, or mute the thread https://github.com/notifications/unsubscribe-auth/ABg8BFrxe49sP0XQIihFEpFMb_tY5MwOks5tsKY3gaJpZM4Sk7uk .

-- Wendell Piez | http://www.wendellpiez.com XML | XSLT | electronic publishing Eat Your Vegetables oo____o_o_ooooo__ooooooo_^

trevor-vaughan commented 6 years ago

@wendellpiez For me it's a "nice to have" syntactic sugar.

@anweiss Do you have a link to the declarations model docs? (Sorry, I'm still having trouble figuring out what is what)

anweiss commented 6 years ago

@trevor-vaughan no worries! the declarations model docs have yet to be developed ... for now, you can take a look at the current 800-53 OSCAL declarations at examples/SP800-53/SP800-53-oscal-declarations.xml ... the declarations XML schema is at schema/xml/XSD/oscal-declarations.xsd

@wendellpiez @kscarf1 we should make an issue to scaffold the declarations docs out

david-waltermire commented 6 years ago

@trevor-vaughan Can you elaborate on your concerns around reverse translation? Are you concerned about round-trips such as XML -> JSON -> XML? Or something else?

We are working to normalize the format for how all catalog standards are represented. We have a declarations model, which provides for some additional standard-specific constraints (e.g., customization of cardinality, required data elements/properties, etc.) which are looser in the base format.

A key constraint on this effort must be that the XML and JSON information must be round-trip capable without information loss. That being said, it can be possible that for some use cases some data reduction can be performed where information loss is acceptable, but the base formats must support non data loss for use cases where that will not work.

@anweiss I don't mind making properties unique in both the XML and JSON if they can be multi-valued. I don't like doing things differently in XML vs JSON, as this will affect round-trip capabilities. If we need multiple occurrences of a property that are single-valued, this won't work. We need to better understand the usage in current SP800-53, ISO 27001, and COBIT 5 catalogs before we make this change.

I am also concerned with making parameters single (scalar) valued. I think we need objects to allow for additional property expansion as we add things like data types, valid values, range restrictions, etc.

david-waltermire commented 6 years ago

@wendellpiez @kscarf1 We have an old issue #46 as a proto-user story for documenting declarations. Could one of you flesh this out using the user story template?

trevor-vaughan commented 6 years ago

@david-waltermire-nist Fundamentally, 90% of the people that I've spoken with are super excited about the JSON format and hate the idea of XML at all. I personally like XML for documentation and mappings but agree that it's a bear and a half to figure out and work with.

So, effectively, users want to write their site local material as JSON and not worry about the 'official' XML. But they do want to make sure that whatever they write is going to be "correct" if given to the security powers that be and whatever random tools they may be using that may only support the XML format.

If I were writing a tool, I would probably base it on the XML for the "correctness", signing, validation, etc... I feel like the JSON is a nice user interface to the system but masks the complexity of the underlying system down to something that regular users can understand and deal with.

david-waltermire commented 6 years ago

@trevor-vaughan This makes sense. This is why round-trip capability is so important. I am thinking about an OSCAL catalog/profile that might be written in JSON, exchanged in XML, and then customized in JSON. This means conversion from JSON to XML to JSON. This also means that when writing in JSON, the format needs to be expressive and support the range of OSCAL features that editors will need.

That being said, I think it will be a bad state if users directly have to write XML or JSON. Most of the folks that will need to write this content will not understand either. I imagine tools with more user-friendly UIs would support the editing, with the end-result being produced in JSON or XML.

trevor-vaughan commented 6 years ago

@david-waltermire-nist Sorry, I find this horribly amusing:

That being said, I think it will be a bad state if users directly have to write XML or JSON. Most of the folks that will need to write this content will not understand either. I imagine tools with more user-friendly UIs would support the editing, with the end-result being produced in JSON or XML.

Can I interest you in some SCAP?

david-waltermire commented 6 years ago

@trevor-vaughan Indeed. This is a big lesson I learned from the SCAP project. SCAP started with the assumption that we need to support users editing the XML directly for now, and that tools will eventually emerge. They never really did to a large degree, which I would attribute to the complexity of the formats making it difficult for tools to be created. I'd hate to repeat that problem.

To not repeat this with OSCAL we need to develop a format that is designed with tools in mind to support the authoring. This means we need to keep it simple as much as possible and we cannot assume that tools will eventually emerge. IMHO, the tooling must be worked on in parallel to the development of the formats.

trevor-vaughan commented 6 years ago

@david-waltermire-nist That's encouraging. I feel like the OSCAL project will need to publish a FOSS rudimentary editor if you really want to get decent adoption.

Make it in something that complies with the all the NIST/STIG regs and it could be the shining example of what to do that I would love to be able to reference. It would be a great public dogfooding project for the entire stack.

anweiss commented 6 years ago

@trevor-vaughan to David's point on tools, this is what is being worked on in parallel -> https://github.com/opencontrol/oscalkit ... it's currently being used to do the conversions and generate the JSON samples

trevor-vaughan commented 6 years ago

@anweiss Ah, most excellent! I was hoping that OpenControl would migrate to this. Unfortunately, for most users, OpenControl was still too much.

They need some sort of web-based editor with easy buttons and a visual model of the abstract tree. And, of course, if this were created from the ground up following the entire 800-53 stack, it would be the penultimate example for reference and complex enough to work out kinks in the specification.

anweiss commented 6 years ago

https://github.com/usnistgov/OSCAL/issues/102 might be of help here

wendellpiez commented 6 years ago

Sprint 10 Progress

Goal: no 1 / All

Progress: Brief conversations btw myself and @anweiss; also I have sent a slide deck / proposal to @david-waltermire-nist for review and consideration

Preliminary discussions suggest we approach developing the common schema model (constraints set) as an abstraction of requirements discovered in refining the two-way XML <-> JSON mapping and conversion. By developing a means to validate JSON against such a constraints set, we also isolate requirements for analogous validations on the XML side. We start with the premise that XPath-based (query-based) validation (such as Schematron) will be sufficient: this enables us to set aside the XSD for now, treating it as "process scaffolding".

If, while doing this, we capture descriptions of the constraints, we can proceed to extend these to provide capable inputs (formalisms) for code generation in support of validation logic for either/both the XML and JSON forms.

% Complete: 2%

Open Issues: Further discussions, specification, validation of the approach? from @david-waltermire-nist and stakeholders with relevant experience

wendellpiez commented 6 years ago

5/2/2018 Sprint 10 Progress

Goal: no 1 / All

Progress: we now have (handmade) mockups of a metaschema with matching XSD schema, along with a sketch of a test instance (a single control) valid to the XSD (as also to the official catalog.xsd). In principle, the metaschema enforces constraints over the schema modeling that reflect specifically the data modeling and mapping requirements we see casting from XML to JSON and back again. The two have been designed together on the assumption that the XSD will be produced from the metaschema with no other inputs in principle, just implementation logic.

https://github.com/wendellpiez/OSCAL/tree/feature-metaschema/schema/metaschema

% Complete - validation of the approach: 50% (no hitches so far) - implementation: 4%

Next steps for me:

Open Issues: All depends on whether our pipeline can produce a JSON schema we can consume and apply to our data. This implies running JSON schema validation over test instances and assessing the results. Suggest early coordination btw @anweiss and myself to ensure this happens. Are there CL tools I can try? Web-based services? How is conformance established when it comes to these tools? Which are we going with? Several?

anweiss commented 6 years ago

Adding to @wendellpiez's progress updates:

I'm working in parallel with Wendell on the JSON and JSON schema side in my feature branch off of his fork -> https://github.com/anweiss/OSCAL/tree/feature-metaschema-json/schema/metaschema. Thus far, I've produced the following artifacts to coincide with Wendell's modeling exercise:

While the more structured and consolidated JSON variant does allow for better readability and parsability and reduced file size, it does require a more complex JSON schema ... in fact, it requires two schemas, one for the core model, and the other which is dynamically generated based on the original JSON document. This is mainly due to the lack of the proposed $data keyword that can address this complexity but that which won't be made available until this fall in JSON schema Draft 09.

I'm currently validating my JSON artifacts against their corresponding JSON schemas using the ajv CLI tool, which is by far the most feature-complete. I'm also validating the JSON schemas themselves against the official Draft 07 core/Validation meta-schema using the same ajv CLI tool.

Once Wendell produces XSpec scenarios to validate the transformation rules, I'll duplicate those tests using an equivalent JavaScript BDD framework and apply it to the transformed JSON. This will allow us to validate that our core meta-model is indeed being applied across both formats.

In this exercise, I've also come up with a couple of options to support roundtripping; namely a misc property which is essentially an object that holds some of the original XML elements but that aren't applicable to JSON consumers and/or not well-represented by JSON. The other option is an intermediary mapping table which is an object that maps embedded prose, property order, etc to an equivalent JSON property/object that is better represented. Essentially, this mapping table would be embedded into the OSCAL meta-schema.

anweiss commented 6 years ago

Per today's discussion ... to avoid schema and meta-schema complexity and allow for parsability of prose in JSON, we decided on using a subset of liquid-like templating syntax to represent inline elements (e.g. parameter insertions, etc) ... the exact syntax needs further discussion, but something like:

<p class="description">Develops, documents, and disseminates to <insert param-id="ac-1_a"/>:</p>

becomes:

<p class="description">Develops, documents, and disseminates to {{ param.ac-1_a }}/>:</p>
wendellpiez commented 6 years ago

Related to this, and pursuant to our discussions Tuesday. One idea, with the goal of simplifying and flattening prose, would be to prohibit class and id elements from HTML (thereby not warranting it to be addressable and making it fully portable to/from markdown). However, this complicates things for examples such as the"'description" element above. As a p, this will "reduce" into prose, and the description line itself will not be addressable (for patching or any other reason).

The current metaschema suggest an easy solution: we introduce an element (perhaps para or line) to work as properties do (including permitting id), except they would also support inline markup including insertions. This would permit us to relegate the p element to "true prose". The new para would be groupable and JSONable using the same pattern as other elements directly inside controls, subcontrols or parts (excepting prose).

I have a short list of other suggestions for adjustments to the catalog model (in view of metaschema and @david-waltermire-nist proposals 5/8/2018). Do these require a user story to track?

wendellpiez commented 6 years ago

5/9/2018 Sprint 10 Progress

GOAL: 1/all

PROGRESS TO DATE: A baby Metaschema is producing an XSD via an XSLT pipeline. The XSD successfully constrains an OSCAL testing mini-model and serves as PoC for the overall design.

PERCENTAGE COMPLETE: 30% (phase one: mini-model)

OUTSTANDING ISSUES:

The next effort will deliver analogous functionality on the JSON side, producing a JSON schema that serves to validate JSON data.

As we do this, it might be good to capture notes on the differences between the two validation models, since either/both may be supplemented by other logic (such as Schematron on the XML side) in a full design.

On the plus side, it appears the draft metaschema format is simple enough that scaling it to "full complexity" (not just a mini OSCAL), will be fairly straightforward. So fleshing out the metaschema to support declarations and profiles as well as catalogs is on the task list.

david-waltermire commented 6 years ago

Why not support addressing the

relative to its parent context which will have an ID? Something like the second p of the part with the ID X.

wendellpiez commented 6 years ago

Would require extending the patching model so that positional addressing would be supported.

Not just <remove id-ref="control.desc.para.2"/> but <remove path="p[2]"/> supporting some syntax ... same for <add> ...

Ordinarily I would consider it to be highly strange to have two "paragraphs" except (IMO) neither para nor p, as tag names, have anything to do with "paragraphs" any more (long separate conversation). I am also open to calling it something else. Point is, it's not a title or a link, gets its own attributes (no linking attributes), but otherwise it is like them (lives in a control or control structure; is not "prose" and can't do lists; otherwise supports inline markup including, importantly, insertions).

anweiss commented 6 years ago

5/10/2018 - Sprint 10 Progress Report

Goal: Update JSON control sample and JSON schema based on Friday's discussion Progress: Will publish by EOD tomorrow so @wendellpiez can proceed with modeling activities Open Issues: None

anweiss commented 6 years ago

@wendellpiez closing the loop on our validation of props, params, and parts in our structured JSON approach ... there is technically nothing in the JSON Data Interchange Format (ECMA-404) that discourages duplicate key names ... however, JSON IETF RFC 7159 states that key names "should be unique", as in "recommended". 7159 also goes into detail as far as why the use of unique key names is recommended. Nice StackOverflow post here too.

So technically, duplicate key names are still considered to be valid JSON objects. And since JSON schema works with valid JSON objects, there's no way to enforce unique object keys. However, most JSON processors out there will prohibit one from having duplicated keys in a JSON object. And per the JSON schema core devs in the community discussion:

... In part to sidestep this sort of ambiguity, JSON Schema is defined on a data model based on JSON, not on the JSON text itself. The data model is just based on the most common parser behavior, and the most common behavior is to parse JSON objects into data structures with unique keys. So by the time JSON Schema sees the data, the parser has already decided whether to raise an error, or just silently drop one or the other property definition.

As such, it seems we may need to assume that OSCAL-formatted JSON data has also been processed in this fashion by end users.

wendellpiez commented 6 years ago

@anweiss fantastic background, thanks, this is really helpful.

I agree for all kinds of reasons we should keep those keys distinct, at least within their containment (parent) scope if not always in document scope (as with parameters). Even if the validators do not enforce this constraint, it is likely as you say to cause problems elsewhere.

We can see to it that any schema produced on either side, enables this constraint (implicitly in how it is produced) and manages cardinality appropriately (on the JSON side, through judicious use of arrays amidst the maps) such that reusing a key is never "necessary" for fidelity with the XML. So even if it isn't or can't always be checked by JSON schema, the rule will always be followed in JSON produced from valid OSCAL XML. Indeed, since the XPath-object analog to JSON maps that I am using is an XPath 3.1 map object ... my tooling gives me no choice (since in this model, keys must be distinct). A good thing! While at the same time, it means that JSON produced by other means, not-yet-known-to-be-valid, could break this rule undetected by JSON schema. Whether that poses a problem, I don't know...

So wouldn't repeating a key break a dumb "eval()" based implementation?

wendellpiez commented 6 years ago

@anweiss as long as we're on the topic, can JSON schema enforce a constraint such that "map M must have property A or property B, but not both"? Or a choice of one from several?

anweiss commented 6 years ago

@wendellpiez I would say in most cases that yea, key repetition will break an implementation ...

To answer your second question, yes, you can enforce such a constraint using the oneOf keyword.

wendellpiez commented 6 years ago

5/17/2018 Sprint 10 Progress

GOAL: 1/all

PROGRESS TO DATE: An XSD pathway is now working in sketched form, so I am looking at the JSON side. The metaschema design is changing (and simplifying) in response to requirements. @david-waltermire-nist is advising re: features plus also timeliness of integration of schema documentation.

PERCENTAGE COMPLETE: 40% (phase one: mini-model)

OUTSTANDING ISSUES: Tools configuration is proving to be an impediment if not always a blocker; due to ANW absence into next week (to advise and help with this issue) we may have to be tactical regarding which metaschema features are priorities (XML support, JSON support, docs support).

david-waltermire commented 6 years ago

I created some basic HTML and Markdown mapping documentation. This captures some of the mapping approach we have been discussing.

wendellpiez commented 6 years ago

5/24/2018 - Sprint 10 Progress Report

Goals w/ Progress to Date:

Find WiP here: https://github.com/wendellpiez/OSCAL/tree/feature-metaschema/schema/metaschema

Percentage Complete: For a "mini" PoC version, 70%. For a fully functional version including documentation support (not a Goal of this Issue): 40%.

The pipeline is now producing useful and correct XSD from a Metaschema format, "correct" in the sense that it describes OSCAL (same tag set, same rules as our present schema). In addition, the Metaschema embeds documentation (of the same tag set) for plugging into our documentation framework (as well as the XSDs). As a special feature: the Metaschema can validate (its own) examples of OSCAL XML against the schema OSCAL XSD that it has defined.

Next comes the production of JSON Schema (that meets the requirements, namely expressing the same or an analogous constraint set over OSCAL JSON that the XSD imposes over OSCAL XML). This is provisionally mapped and about 50% built out (more progress tomorrow 5/24?), but not yet shaken out or tested.

Finally, we have to come back to the XML<->JSON conversion utilities and align them with the metaschema. It is likely that conversions for OSCAL will want to be able to read or otherwise capture info from the metaschema, at runtime or in a "compile" phase, in order to distinguish and complete their mappings. (Such a generalization of our conversion tool may require a User Story if it is not considered covered by this one.)

There are now unit tests for the XSD production and unit testing is set up for the JSON Schema production.

Open Issues: @anweiss and @david-waltermire-nist have both provided help with tooling. I suggest we continue our efforts to stand up a portable tool chain (providing validation services among others) as this has many side benefits.

Another open issue: what to name the tool. I have called the data (document) format a "metaschema" but that (while not inaccurate as a descriptor) is not fixed. That is also a term used to identify other technologies not this one.

anweiss commented 6 years ago

June 21 Status: The metaschema and its associated tooling is now being used to generate JSON and JSON schema based on the existing "1:1" mapping model; which is not currently the best looking JSON/JSON schema. Work to refine the metaschema such that more structured and condensed JSON/JSON schema is produced will be completed in the next Sprint.

david-waltermire commented 6 years ago

This was addressed by PR #201. Follow on work will be addressed by #186, #202, and #224.