usnistgov / liboscal-java

A Java library to support processing OSCAL content
Other
30 stars 14 forks source link

Incorrect serialization of list items in XML #13

Closed david-waltermire closed 1 year ago

david-waltermire commented 2 years ago

Describe the bug

A oscal-catalog:listItemType is incorrectly serialized into XML.

Who is the bug affecting?

liboscal-java users that serialize into XML and the input OSCAL instance contains a oscal-catalog:listItemType node.

What is affected by this bug?

The serialized XML output is invalid, as it does not conform to the released OSCAL XSD.

When does this occur?

When serializing oscal-catalog:listItemType node into XML.

How do we replicate the issue?

  1. Modify OscalBindingContextTest.testLoadCatalogXml(@TempDir Path tempDir) to:
    • use the below given input as the source Catalog; and
    • no longer delete the output.
      1. Execute test.
      2. Validate output using XSD.

Given

Input

<?xml version="1.0" encoding="UTF-8"?>
<catalog xmlns="http://csrc.nist.gov/ns/oscal/1.0"
         uuid="6c5149e8-f3a6-437c-8035-025e9b5fc0bc">
   <metadata>
      <title>Example Catalog with UL</title>
      <last-modified>2022-05-30T14:51:41.185-04:00</last-modified>
      <version>1.0</version>
      <oscal-version>1.0.0</oscal-version>
   </metadata>
   <group>
      <title>Group A of C</title>
      <control id="a1">
         <title>Control A1</title>
         <param id="a1_prm1">
            <label>A1 Parameter 1</label>
         </param>
         <prop name="label" value="first"/>
         <part name="statement" id="a1-stmt">
            <p>A1 aaaaa aaaaaaaaaa</p>
            <ul>
               <li>Item 1</li>
               <li>Item 2</li>
               <li>Item 3</li>
            </ul>
         </part>
      </control>
   </group>
</catalog>

Notting XSD validation of input

Engine name: Xerces
Validation successful

Actual

Output

<?xml version='1.0' encoding='UTF-8'?>
<catalog xmlns="http://csrc.nist.gov/ns/oscal/1.0" uuid="6c5149e8-f3a6-437c-8035-025e9b5fc0bc">
  <metadata>
    <title>Example Catalog with UL</title>
    <last-modified>2022-05-30T14:51:41.185-04:00</last-modified>
    <version>1.0</version>
    <oscal-version>1.0.0</oscal-version>
    <revisions/>
  </metadata>
  <group>
    <title>Group A of C</title>
    <control id="a1">
      <title>Control A1</title>
      <param id="a1_prm1">
        <label>A1 Parameter 1</label>
      </param>
      <prop name="label" value="first"/>
      <part id="a1-stmt" name="statement">
        <p>A1 aaaaa aaaaaaaaaa</p>
        <ul>
          <li>
            <p>Item 1</p>
          </li>
          <li>
            <p>Item 2</p>
          </li>
          <li>
            <p>Item 3</p>
          </li>
        </ul>
      </part>
    </control>
  </group>
</catalog>

Validation output

Engine name: Xerces
Severity: error
Problem ID: cvc-complex-type.2.4.a
Description: Invalid content was found starting with element '{"http://csrc.nist.gov/ns/oscal/1.0":p}'. One of '{"http://csrc.nist.gov/ns/oscal/1.0":code, "http://csrc.nist.gov/ns/oscal/1.0":em, "http://csrc.nist.gov/ns/oscal/1.0":i, "http://csrc.nist.gov/ns/oscal/1.0":b, "http://csrc.nist.gov/ns/oscal/1.0":strong, "http://csrc.nist.gov/ns/oscal/1.0":sub, "http://csrc.nist.gov/ns/oscal/1.0":sup, "http://csrc.nist.gov/ns/oscal/1.0":q, "http://csrc.nist.gov/ns/oscal/1.0":insert, "http://csrc.nist.gov/ns/oscal/1.0":a, "http://csrc.nist.gov/ns/oscal/1.0":img, "http://csrc.nist.gov/ns/oscal/1.0":ul, "http://csrc.nist.gov/ns/oscal/1.0":ol}' is expected.
Start location: 22:26
End location: 22:27
URL: http://www.w3.org/TR/xmlschema-1/#cvc-complex-type

Engine name: Xerces
Severity: error
Problem ID: cvc-complex-type.2.4.a
Description: Invalid content was found starting with element '{"http://csrc.nist.gov/ns/oscal/1.0":p}'. One of '{"http://csrc.nist.gov/ns/oscal/1.0":code, "http://csrc.nist.gov/ns/oscal/1.0":em, "http://csrc.nist.gov/ns/oscal/1.0":i, "http://csrc.nist.gov/ns/oscal/1.0":b, "http://csrc.nist.gov/ns/oscal/1.0":strong, "http://csrc.nist.gov/ns/oscal/1.0":sub, "http://csrc.nist.gov/ns/oscal/1.0":sup, "http://csrc.nist.gov/ns/oscal/1.0":q, "http://csrc.nist.gov/ns/oscal/1.0":insert, "http://csrc.nist.gov/ns/oscal/1.0":a, "http://csrc.nist.gov/ns/oscal/1.0":img, "http://csrc.nist.gov/ns/oscal/1.0":ul, "http://csrc.nist.gov/ns/oscal/1.0":ol}' is expected.
Start location: 25:26
End location: 25:27
URL: http://www.w3.org/TR/xmlschema-1/#cvc-complex-type

Engine name: Xerces
Severity: error
Problem ID: cvc-complex-type.2.4.a
Description: Invalid content was found starting with element '{"http://csrc.nist.gov/ns/oscal/1.0":p}'. One of '{"http://csrc.nist.gov/ns/oscal/1.0":code, "http://csrc.nist.gov/ns/oscal/1.0":em, "http://csrc.nist.gov/ns/oscal/1.0":i, "http://csrc.nist.gov/ns/oscal/1.0":b, "http://csrc.nist.gov/ns/oscal/1.0":strong, "http://csrc.nist.gov/ns/oscal/1.0":sub, "http://csrc.nist.gov/ns/oscal/1.0":sup, "http://csrc.nist.gov/ns/oscal/1.0":q, "http://csrc.nist.gov/ns/oscal/1.0":insert, "http://csrc.nist.gov/ns/oscal/1.0":a, "http://csrc.nist.gov/ns/oscal/1.0":img, "http://csrc.nist.gov/ns/oscal/1.0":ul, "http://csrc.nist.gov/ns/oscal/1.0":ol}' is expected.
Start location: 28:26
End location: 28:27
URL: http://www.w3.org/TR/xmlschema-1/#cvc-complex-type

Expected behavior (i.e. solution)

Output that aligns with the markup-multiline documentation and matches released OSCAL XML Schema.

Other Comments

None

david-waltermire commented 2 years ago

After investigating this, it looks like flexmark is generating the extra paragraphs.

Here is what the resulting node tree looks like:

Document[0, 48]
  Paragraph[0, 20] isTrailingBlankLine
    Text[0, 19] chars:[0, 19, "A1 aa … aaaaa"]
  BulletList[21, 48] isTight
    BulletListItem[21, 30] open:[21, 22, "*"] isTight
      Paragraph[23, 30]
        Text[23, 29] chars:[23, 29, "Item 1"]
    BulletListItem[30, 39] open:[30, 31, "*"] isTight
      Paragraph[32, 39]
        Text[32, 38] chars:[32, 38, "Item 2"]
    BulletListItem[39, 48] open:[39, 40, "*"] isTight
      Paragraph[41, 48]
        Text[41, 47] chars:[41, 47, "Item 3"]

I need to figure out how to suppress or filter this to fix it.

david-waltermire commented 2 years ago

Thinking more about this, it might be better to allow the paragraphs, since in Markdown you can have many paragraphs in the list item.

For example:

So alternately, I could update the HTML binding to allow this.

david-waltermire commented 1 year ago

I ended up adding support for tight list items in usnistgov/metaschema-java#159. This will work as described in this bug report once #150 is merged.