usnistgov / OSCAL

Open Security Controls Assessment Language (OSCAL)
https://pages.nist.gov/OSCAL/
Other
650 stars 178 forks source link

Better defined linking semantics regarding resources #567

Closed wendellpiez closed 1 year ago

wendellpiez commented 4 years ago

User Story:

So far in the OSCAL specifications we have relied on web standards for defining links (URIs and URI fragments). As long as a link is simply a pointer to a location in the file or to an external resource, this has the advantage of familiarity, and it works okay even if not fully specified with respect to all conceivable targets. (How does a URI link to the middle of a Word document?)

However we have also stipulated <resource> elements in metadata that can serve as proxies for resources that are better controlled than arbitrary documents on the web. Specifically, they enable either authentication of a target references out of line (via a hash), or inline encoding of a target as base64 (so, a binary as inline attachment).

The way linking semantics should work with respect to resources, especially when link targets are internal to those resources, needs to be specified.

Goals:

To address this problem:

  1. Acquire or make illustrative samples or mockups
  2. Test and iterate a design that supports designation of linking into contents of out-of-line resources
  3. Document this usage

One approach could work something like this. It works by extending the semantics of href when it targets a <resource> element specifically, such that a fragment identifier can be provided as a runtime query value:

<oscal>
  <link rel="..." href="https://foo.org/bar#fragment"/><!-- direct, external reference -->
  <link rel="..." href="#internal-fragment1"/><!-- internal, indirect reference -->
  <link rel="..." href="#internal-fragment2"/><!-- internal, attachment reference, but no way to identify a resolved target fragment -->
  <link rel="..." href="#internal-fragment2?fragment=target"/><!-- internal, attachment reference, pointing to "#target" within the attached resource -->
  <link rel="..." href="#internal-fragment1b?fragment=target"/><!-- internal,  indirect reference, pointing to "#target" within the indirect resource. Resolves to: https://foo.org/bar#target -->
...
  <back-matter>
    <resource id="internal-fragment1">
      <rlink href="https://foo.org/bar#fragment"/><!-- direct, external reference -->
    </resource>
    <resource id="internal-fragment1b">
      <rlink href="https://foo.org/bar#"/><!-- direct, external reference -->
    </resource>
    <resource id="internal-fragment2">
      <base64>...</base64>
    </resource>
  </back-matter>
</body

Dependencies:

Determine where this documentation should live: in the schema docs and/or other guidance?

Acceptance Criteria

wendellpiez commented 4 years ago

Possibly related: whether the expectation of profile resolution in linking should be made explicit (special rlink/@rel?), or does simply pointing to a profile suffice?

smichelotti commented 4 years ago

We'd like to better understand how linking semantics are going to work. If we look at line 39 of FedRAMP_HIGH-baseline_profile.json we see this JSON doc is linking to an XML doc representing the NIST catalog stored on GitHub. Similarly, line 1309 shows the JSON doc linking to an XML doc for the FedRAMP catalog (and JSON docs linking to XML docs is odd, but I assume an oversight).

Additionally, line 33 of NIST_SP-800-53_rev4_HIGH-baseline_profile.json links to the inline "href": "#catalog" which resolves to line 2621 in the "back-matter" section at the end of the doc.

This makes it difficult for consuming code to "follow the links". And many IF statements required to check if it's a local #identifier versus an absolute URI. And it is an absolute URI, it's not really useful because it's linking to an XML doc in GitHub that we're not using.

Preference would be for linking semantics to look something like this, where the IDs are the canonical identifiers of the NIST and FedRAMP catalogs respectively. And if not canonical, then at least a URI that could have been established with <link rel="self" href="..." /> in the originating catalog:

{
  "profile": {
    "imports": [
      {
        "href": "/catalogs/uuid-47fdefdb-dc1a-4040-9f27-b517a16b06d2", 
        "include": {
          "id-selectors": []
        }
      },
      {
        "href": "/catalogs/uuid-ed364452-47f8-4e70-b3a4-ef54de5f46e2",
        "include": {
          "id-selectors": []
        }
      }
    ]
  }
}

or perhaps this:

{
  "profile": {
    "imports": [
      {
        "href": "/catalogs/NIST_SP-800-53_rev4_catalog.json",
        "include": {
          "id-selectors": []
        }
      },
      {
        "href": "/catalogs/FedRAMP_catalog.json",
        "include": {
          "id-selectors": []
        }
      }
    ]
  }
}

Any concrete examples that could be provided would be much appreciated.

brian-ruf commented 4 years ago

@smichelotti, on the topic of the FedRAMP profile in JSON pointing to an XML catalog, you've uncovered a bit of a blind-spot (at least for me) in our conversion process.

The FedRAMP baselines are managed in XML and converted to JSON. We need to add a step in that process, which ensures any import links are updated such that they point to the JSON version of the upstream catalog rather than the XML version.

You are also bumping into a challenge I cite periodically, which is that OSCAL's approach assumes the use of 4GL capabilities (like XPath and XSLT). With those tools, resolving URI fragments that point to resources is apparently fairly easy, but trying to manage those links programmatically is very challenging. Especially with all the possible variations for achieving the same result.

This is why the _Guide to OSCAL-based FedRAMP SSPs_ describes FedRAMP's preferred use of OSCAL. To limit all the variances for tool developers.

I'll defer to others on the team regarding your feedback and recommendations.

wendellpiez commented 4 years ago

Indeed. Concurring with @brianrufgsa, there is definitely some work to be done to nail this down, document and demonstrate it. Updating links is very much part of the problem.

wendellpiez commented 4 years ago

Update March 12

We need to start by assessing the current state of documentation and what needs to be done to improve it. This will include both Metaschema documentation and examples.

brian-ruf commented 4 years ago

Per conversation with @david-waltermire-nist and @wendellpiez, we need to be clear about intentions when multiple rlink entries are present in a resource, such as to specify both an XML and JSON version of a file.

wendellpiez commented 4 years ago

@david-waltermire-nist and @brianrufgsa I suggest we sketch a resource or two (mockup or actual example), on this Issue or another for the purpose. It could show how to reference a bi- or tri-combination btw XML, JSON and YAML, plus also a functional distinction e.g. profiles, whether resolved-and-serialized (cached) or unresolved.

Not only are there docs and metaschema docs (#657) to consider; we also have the data.

david-waltermire commented 1 year ago

Closing this in favor of issue #756 which is addressing the same need. Solutions proposed here will be considered in that issue.