Open iMichaela opened 7 months ago
The current proposed mapping model incorporates several new features from the previous draft (of which IBM currently employs using the commensurate open source compliance-trestle branch).
ref: https://pages.nist.gov/OSCAL-Reference/models/prototype-mapping-model/mapping/json-outline/
Provenance is a new required sub-structure that "Describes requirements, incompatibilities and gaps that are identified between a target and source in a mapping item." The knee jerk reaction was to ask for this to be not required, since it was not present in the previous draft. However, upon further consideration this is a good idea. There is be some information about the mapping collection beyond the boiler plate Metadata.
Method: adding human+automation as a 3rd allowed value seems appropriate, and might be the most common mapping methodology employed? Matching: The link to set theory provides explanation of the relationship mapping rationale choices. When using AI for mappings will one be able to determine what rationale the AI employed? Confidence-score: as expressed in this issue, this seems best represented as a value plus a description of what that value represents.
Some further questions:
Another comment around provenance - I would like to propose that the links
and props
elements be added to the the provenance
assembly. A possible use case for this would to be to provide additional explanation about elements such as confidence score
and how they are calculated or determined.
I would also +1 to question one @degenaro's list about the use of maching-rationale
in provenance
and maps
.
@degenaro - Thank you for your comments. They are very useful.
You are asking: 1.
Matching: The link to set theory provides explanation of the relationship mapping rationale choices. When using AI for mappings will one be able to determine what rationale the AI employed?
Great question for the AI-model developers/engineers. I would like to think the bathing models provide consistency by operating on one type of matching, per training of the model. If that is not possible, an AI-driven approach is even worth than a human-based one since a human can identify what thinking process was applied, despite the biases that might exist. We should call on all OSCAL implementors using today AI methods in their tools or process, and find out.
Confidence-score: as expressed in this issue, this seems best represented as a value plus a description of what that value represents. We agreed having a numeric value and a descriptor is better than a string and I totally agree. You also asked , reasonably so, for a more prescriptive way of "computing" the numeric value for consistency and usability in automation. Here are some thoughts, and I value everyone's feedback:
matching-score
- a numeric complementary information of the qualifier
confidence-score
(under the proposed numeric value) , the standard deviation mappings
for reciprocity or automation, one might choose to discard (not use) the map
s that have a a matching-score
outside the (mean-standard_deviation, mean+standard_deviation), depending on the use case.The other questions you asked:
Can provenance matching rationale be syntactic yet mapping.maps rationale be something different? Seems confusing.
- The initial thought of the team was "yes" - aiming to allow for flexibility, BUT IF
mappings/map/matching-rationale
allows for a local overwrite of theprovenance/matching-rationale
, I am thinking today after you challenged the model, that from the usability perspective of the mapped information when used to automate reciprocity determination, allowing to overwrite the globalprovenance/matching-rationale
locally withmappings/map/matching-rationale
is preventing easy use of the information for automation and reciprocity purpose. Thoughts? I think that instead of expecting a tool to be selective and consistent when suing a mapped source and target data and retain only therelationship
s that were created with the samematching-rationale
, maybe we should create multiple mapping collections, but keep the matching-rational consistent across one set. Thoughts?Should each map have its own individual confidence score/description?
- Please see my comment above regarding individual
matching-score
. I agree with you, it would be very helpful.Should each
map/mapping
have its ownmethod
? For example, the Provenance may be method automation, but the mapping for one particular mapping might be human?
- During our research we discussed the possibility of having a
mappings/map/method
for eachmap
, wheremethod
would be optional and can be used to identify where a human overwrites an AI-determinedrelationship
. If you think it would be important to have, I think it would be reasonable to add.Same for status?
- The
status
description today it is confusing, to me, and I am sorry I missed it when reviewing the prototype model. The description reads: "The focus of the qualifier", but the acceptable values: complete; not-complete; draft; deprecated: superseded, are referring to the overall status of the mapping collection. I suggest updating the description, but not include astatus
for eachmap
unless you have a use case I cannot picture today.Since meta-data already has responsible-parties and remarks, why does provenance need the same fields?
- I agree, I do not think it is necessary to have another
provenance/responsible-party
, but I will review the research spirals to make sure I do not miss any vision when agreeing to this statement.
I agree with @jpower432 . Thank you for the props
and links
recommendations. Please also see the replies above.
ISSUE: namesapece needed under provenance Thanks
As discussed today, the following changes can be made -
As discussed today, the following changes can be made -
- Make confidence-score optional under provenance.
- Status can also be made optional
- Have another option for matching-rationale - "others" if none of the existing 3 rationale was used.
- matching-rationale should be removed from "maps" as the rationale will be same for all control mappings and not change from one mapping to the other.
@vikas-agarwal76 and @ancatri - thank you for the recommendations. I think we also discussed to include a method
under maps
to allow a human reviewer to identify, for example, which relationships
were corrected when overall initial method
was done automatically.
QUESTION: Do we need, in scenarios like the one above, to include a responsible-party
? In the metadata
, roles
can be identified (reviews included) , parties
can be documented, and here under maps
we can document which party reviewed and changed the relationship
by including a responsible-party
[0,1] .
PLEASE NOTE: there is a responsible-party
under provenance
which might be sufficient when only one party
responsible to review the content exists.
As discussed today, the following changes can be made -
- Make confidence-score optional under provenance.
- Status can also be made optional
- Have another option for matching-rationale - "others" if none of the existing 3 rationale was used.
- matching-rationale should be removed from "maps" as the rationale will be same for all control mappings and not change from one mapping to the other.
Also, namespace ns
is necessary also for the qualifier
, based on @ancatri concerns expressed verbally during the meeting. I would greatly appreciate having the concerns documented as well here so the community can weigh-in.
@iMichaela We had a detailed discussion on this and here are our suggestions -
@vikas-agarwal76 - I agree with all recommendations, with a caveat fro the following one:
- keep 'matching-rationale' at both provenance and map level but make them optional. Applications will always check the map level and if missing will use the provenance as default
IF provenance has NO matching-rationale, then every map MUST have one. Otherwise, a map/relationship
is useless since no one can guess what approach the mapper (AI or human) used. Please recall that the matching-rationale
values are:
syntactic: Syntactic: How similar is the wording that expresses the two concepts. This is a word-for-word analysis of the relationship, not an interpretation of the language.
semantic: Semantic: How similar are the meanings of the two concepts? This involves some interpretation of each concept’s language.
functional: Functional: How similar are the results of executing the two concepts? This involves understanding what will happen if the two concepts are implemented, performed, or otherwise executed.
More information about those values, definitions and small examples are available in the NIST IR 8477
Without knowing , understanding how the mapping was done, a syntactic subset
relationship can be a functional superset
Example we used
control-1 C1: "Implement TLS"
control-2 C2: "Implement TLS v1.2 or above"
(matching-rationale: syntactic) => (C1 subset-of C2) ; (remarks: C2 has additional words in the requirement)
(matching-rationale: semantic) => (C1 subset-of C2) ; (remarks: C2 has additional requirements enforcing the accepted versions)
(matching-rationale: functional) => (C1 superset-of C2) ; (remarks: C1 can be satisfy with all possible version of TLS, while C2 has additional requirements limiting the versions of TLS to only TLS 1.2 and above, fewer possibilities than all versions)
PLEASE NOTE the relationships
above. IF a mapping collection is created without specifying the approach, since remarks and other fields are free form and can be skipped, a data set will be not just useless, but confusing.
I am suggesting noting in your application the semantic
as the default in the provenance.
I need to research the enforcement of the constraint mentioned above, BUT my suggestion would be to use a default value and keep provenance/matching-rationale
with cardinality [1]
@iMichaela Here is a sample OSCAL mapping model that we created based on the disucssions that we had. nist-mapping-sample.json
Here is the summary -
User Story
In the Control Mapping Model, the
provenance/confidence-score
had a typestring
but could serve better if it would allow doe a numeric score and a description of how it was calculated. The reference needs clarification since it indicates the score should be included if the mapping was done automatically.The values listed for the
provenance
(human and automation), do not cover the case of automatic generation with human review.A namespace was requested by IBM and RedHat for the
confidence-score
to allow for calculations of the score using other methods (when needed)Goals
1) replace the
string
type for theconfidence-score
with something better. An integer and a description would be more appropriate. Being more prescriptive around calculation of the confidence score is also of interest. IMPORTANT TO NOTE: A more strict, mathematical approach would provide consistency as requested by the community members, while preserving flexibility for other methods under distinctns
.Dependencies
No response
Acceptance Criteria
(For reviewers: The wiki has guidance on code review and overall issue review for completeness.)
Revisions
No response