moov-io / signedxml

pure go library for processing signed XML documents
MIT License
52 stars 44 forks source link

ExclusiveCanonicalization: neighbouring comments are not all stripped when `WithComments: false` #50

Closed laurenkt closed 5 months ago

laurenkt commented 5 months ago

When there are runs of multiple comments like <!-- comment0 --><!-- comment1 --> then the second one is not stripped. This doesn't happen if there is whitespace between the comments.

It seems like all comments should be stripped, but I checked just to be sure in case it's some weird quirk of the Canonical XML spec:

The second parameter of input to the XML canonicalization method is a boolean flag indicating whether or not comments should be included in the canonical form output by the XML canonicalization method. If a canonical form contains comments corresponding to the comment nodes in the input node-set, the result is called canonical XML with comments. Note that the XPath data model does not create comment nodes for comments appearing within the document type declaration (DTD). Implementations are REQUIRED to be capable of producing canonical XML excluding all comments that may have appeared in the input document or document subset. Support for canonical XML with comments is RECOMMENDED.

Here's a minimal project test case which reproduces the issue:

package main

import (
    "github.com/moov-io/signedxml"
    "github.com/stretchr/testify/assert"
    "github.com/stretchr/testify/require"
    "testing"
)

func TestExclusiveCanonicalization_WithNeighbouringComments(t *testing.T) {
    str := `
<a>
  <!-- comment0 --><!-- comment1 -->
</a>
`
    ec := signedxml.ExclusiveCanonicalization{WithComments: false}
    str, err := ec.Process(str, "")
    require.NoError(t, err)
    assert.Equal(t, `
<a>
</a>`, str)
}

Output:

=== RUN   TestExclusiveCanonicalization_WithNeighbouringComments
    main_test.go:19: 
            Error Trace:    main_test.go:19
            Error:          Not equal: 
                            expected: "\n<a>\n</a>"
                            actual  : "\n<a>\n  <!-- comment1 -->\n</a>"

                            Diff:
                            --- Expected
                            +++ Actual
                            @@ -2,2 +2,3 @@
                             <a>
                            +  <!-- comment1 -->
                             </a>
            Test:           TestExclusiveCanonicalization_WithNeighbouringComments
--- FAIL: TestExclusiveCanonicalization_WithNeighbouringComments (0.00s)

With a comment run like:

So it seems like a comment immediately following another comment with no whitespace between them will reduce in comments being missed for stripping.

adamdecaf commented 5 months ago

Thank you for fixing this and including a test case! I'll publish a bugfix release after merging this. I haven't found a complete non-CGO implementation of xml canonicalization yet so we're doing the best we can. If you have issues in the future don't hesitate to reach out for help.