Need proper handling for invalid character for CDATA field #138

Open firodj opened 2 years ago

When output tests contains colored output log that may include escape character (0x1b) on test output report, the escape character is not treated well by encoding/xml package for CDATA section, so it will generate invalid XML document (as spec), that may result the test report being unreadable by the CI (in my case GitLab CI).

Related issue: https://github.com/golang/go/issues/53728

Thanks. It sounds like we should at least strip the invalid characters from the CDATA section. However, since ANSI escape sequences also contain regular characters we'd probably want to strip the entire ANSI sequence rather than just the (invalid) control characters.

I just ran into the same issue with Jenkins junit parser:

org.dom4j.DocumentException: Error on line 5 of document  : An invalid XML character (Unicode: 0x1b) was found in the CDATA section.

Same here, I've tried to update to v2 https://github.com/stackrox/stackrox/pull/2586 and prow does not parse XML as it's not valid.

error on line 2291 at column 25: Input is not proper UTF-8, indicate encoding !
Bytes: 0x00 0x63 0x32 0x68

@janisz Where do these invalid bytes come from, this doesn't look like an ansi escape character. Any concerns if they are just dropped completely from the output?

This bytes came from key that is logged. Key is generated as prefix + separator + id where separator is []byte("\x00") which is a control character. I found a workaround of this and print the key with %q instead of %s. From my perspective it's save to ignore them or escape. https://github.com/stackrox/stackrox/blob/1c51bfebc23b2748bc103a583b5460611dfcad53/pkg/dbhelper/db.go#L10

Minimal example:

package main

import "testing"

func TestEscape(_ *testing.T) {
    println("Test '\x00'")
}

 $ go test ./... -v  2>&1 | go-junit-report -set-exit-code | file -bi -
+ Actual  : application/octet-stream; charset=binary
- Expected: text/xml; charset=us-ascii

Created a PR to drop control characters from output https://github.com/jstemmer/go-junit-report/pull/140

Leaving this issue open for now, because it would be nice if we could also detect ANSI escape sequences and remove them from the output.

@jstemmer is #162 sufficient?

I have run into a similar problem when trying out the newer v2 tool with my GitLab CI pipelines. Before this update the tool was working very well, attempting to run with it today caused GitLab to show: With the error text listed as: JUnit XML parsing failed: 4106:21: FATAL: CData section not finished 2023/07/20 10:27:5 Looking in the junit-report.xml file that was generated and that is being complained about I find the following lines:

    <testsuite name="gitlab.company.com/some/paths/gomodule" tests="72" failures="0" errors="0" id="82" hostname="runner-bvfrfxhz-project-185-concurrent-0rd9jq" time="0.550" timestamp="2023-07-20T10:29:44Z">
        <testcase name="Test_Buffer_Is_Full" classname="gitlab.company.com/some/paths/gomodule" time="0.000"></testcase>
        <testcase name="Test_Buffer_Processing" classname="gitlab.company.com/some/paths/gomodule" time="0.000"></testcase>
        <testcase name="Test_VariablesCanBeRead" classname="gitlab.company.com/some/paths/gomodule" time="0.020">
            <system-out><![CDATA[Calibration [ERROR] precondition not met
Calibration [ERROR] precondition not met
Calibration [ERROR] precondition not met
Calibration [ERROR] precondition not met
properties [WARN] Warning no property file located]]></system-out>
        </testcase>
        <testcase name="Test_ChangeDbPath" classname="gitlab.company.com/some/paths/gomodule" time="0.020"></testcase>
        <testcase name="Test_InvalidPathCannotBeApplied" classname="gitlab.company.com/some/paths/gomodule" time="0.000">
            <system-out><![CDATA[
2023/07/20 10:27:56 [35m/builds/gitlab.company.com/some/paths/gomodule/persistence.go:53
[0m[31m[error] [0mfailed to initialize database, got error unable to open database file: is a directory]]></system-out>
        </testcase>

(company identifying path names altered) It is the second set of CDATA being complained about. As others have said, likely due to the presence of an escape or UNICODE charter.

A cheap work around would be to loose the "system-out" data with a parameter or configuration setting. Stripping or replacing invalid characters would obviously be nicer if possible.

Let me know if I can be of any help in diagnosing or fixing the problem.

@TafThorne what version are you using? I thought special characters will be replaced with #140

I would assume #140 was not sufficient. Had the same problem and needed to use the fork / the PR #162 to fix it.

@TafThorne what version are you using? I thought special characters will be replaced with #140

I ran with the latest v2 tagged code I think:

go install github.com/jstemmer/go-junit-report/v2@latest && go install github.com/t-yuki/gocover-cobertura@latest[0;m
go: downloading github.com/jstemmer/go-junit-report/v2 v2.0.0
go: downloading github.com/jstemmer/go-junit-report v1.0.0
go: downloading github.com/t-yuki/gocover-cobertura v0.0.0-20180217150009-aaee18c8195c

So that would be v2.0.0 https://github.com/jstemmer/go-junit-report/commit/7fde4641acef5b92f397a8baf8309d1a45d608cc which is from Jul 1, 2022. I can try to repeat with the latest code on master and see if it fixes it... one second....

It was more than a second but I can confirm that by using the latest changes on main things worked correctly for me.

go install github.com/jstemmer/go-junit-report/v2@7933520f1e28509e5ba6e95a6c3157d53c223561 && go install github.com/t-yuki/gocover-cobertura@latest[0;m
go: downloading github.com/jstemmer/go-junit-report/v2 v2.0.1-0.20230321231245-7933520f1e28
go: downloading github.com/t-yuki/gocover-cobertura v0.0.0-20180217150009-aaee18c8195c

The above are the details of the changeset that worked for me.

Is there a plan to release some kind of 2.0.1 set of bug fixes or similar in the near future or should I wait for the 2.1.0 release I can see pencilled in?

jstemmer / go-junit-report

Need proper handling for invalid character for CDATA field #138