google / go-cmp

Package for comparing Go values in tests
BSD 3-Clause "New" or "Revised" License
4.08k stars 209 forks source link

Help with tests using cmp.Diff and chinese characters? #334

Closed bashbunni closed 10 months ago

bashbunni commented 10 months ago

Hey there,

I'm trying to reproduce an issue in one of the projects I maintain and am using cmp.Diff to show what went wrong when a test fails. The issue I'm facing now is that I can't read the output, so I'm not able to do much with the information given.

I guess my question is, what can I do with the byte output shown below and would the // +|.[0m.[38;5;252m| be the string value of what's changed?

    glamour_test.go:279: got != want
        -want +got:
        diff:
          string{
            ... // 78204 identical bytes
            0x6d, 0x1b, 0x5b, 0x33, 0x38, 0x3b, 0x35, 0x3b, 0x32, 0x35, 0x32, 0x6d, 0x1b, 0x5b, 0x30, 0x6d, //  |m.[38;5;252m.[0m|
            0x20, 0x20, 0x1b, 0x5b, 0x33, 0x38, 0x3b, 0x35, 0x3b, 0x32, 0x35, 0x32, 0x6d, 0x31, 0x3a, 0x34, //  |  .[38;5;252m1:4|
        +   0x1b, 0x5b, 0x30, 0x6d, 0x1b, 0x5b, 0x33, 0x38, 0x3b, 0x35, 0x3b, 0x32, 0x35, 0x32, 0x6d,       // +|.[0m.[38;5;252m|
            0x3a, 0x39, 0x1b, 0x5b, 0x33, 0x38, 0x3b, 0x35, 0x3b, 0x32, 0x35, 0x32, 0x6d, 0x20, 0x1b, 0x5b, //  |:9.[38;5;252m .[|
            0x30, 0x6d, 0x1b, 0x5b, 0x33, 0x38, 0x3b, 0x35, 0x3b, 0x32, 0x35, 0x32, 0x6d, 0x20, 0x1b, 0x5b, //  |0m.[38;5;252m .[|
            ... // 340063 identical bytes
          }
--- FAIL: TestWrapping (0.33s)
    --- PASS: TestWrapping/english_short (0.00s)
    --- PASS: TestWrapping/chinese_short (0.00s)
    --- FAIL: TestWrapping/chinese_long (0.33s)
FAIL

cmp version: github.com/google/go-cmp v0.5.9 go version: go 1.17

Here's a link to the pull request and an example output we're comparing:

https://github.com/charmbracelet/glamour/pull/249 testdata/issues/long-chinese-text.test

Thank you very much for your great project and I appreciate any guidance you're able to give :)

dsnet commented 10 months ago

This output seems to be working as intended. You're comparing two strings that cmp.Diff has detected contains non-printable characters. For that reason, it switched to a mode where it diffs the raw byte values.

This particular output is saying that the got has an additional string injected at some offset after 78204:

0x1b, 0x5b, 0x30, 0x6d, 0x1b, 0x5b, 0x33, 0x38, 0x3b, 0x35, 0x3b, 0x32, 0x35, 0x32, 0x6d,

The best ASCII representation of this string is:

.[0m.[38;5;252m

(BTW, the output you are seeing is inspired by the hexdump utility, which prints the raw hex values on the left, and the best ASCII representation on the right.)

This happens to be an ANSI escape sequence that is common in terminals.

bashbunni commented 10 months ago

@dsnet ah amazing, thank you for your help!