usnistgov / oscal-tools

Tools for the OSCAL project
https://pages.nist.gov/oscal-tools/
34 stars 17 forks source link

Provide diff tool for flexibility in handling whitespace. #8

Closed wendellpiez closed 2 years ago

wendellpiez commented 4 years ago

White space in string contents can perturb functional requirements for diffing.

In XML, white space is defined as any sequence of Unicode characters 9 (horizontal tab), 10 (LF), 13 (CR) and 32 (space).

For diffing the question is, where do differences in white space reflect actual (reportable) differences, vs where are differences in white space merely cosmetic, or side effects of processing (for example, attribute value normalization or line feed handling in XML).

Example: which of these are the same?

A

<prop name="label">AC-2 (1)</prop>

B

<prop name="label">AC-2(1)</prop>

C

<prop name="label">
   AC-2 (1)
</prop

In some circumstances, these should all be different. In other circumstances, A and C might be the same, while B is different.

To make this worse, the circumstances vary not only due to functional requirements in operation, but also according to the semantics of the data point (what is it representing). Ignoring white space for comparison in one place might be a bug at the same time as it is a feature elsewhere.

One potential approach to this problem would be providing for configuration of white space handling in node comparison.

aj-stein-nist commented 2 years ago

Per discussion with @wendellpiez, since we have the oscal-deep-diff tool, we can close this issue.