ugorji / go

idiomatic codec and rpc lib for msgpack, cbor, json, etc. msgpack.org[Go]
MIT License
1.83k stars 294 forks source link

UTF-8 validation isn't applied to CBOR text strings nested inside an indefinite-length text string #404

Closed benluddy closed 8 months ago

benluddy commented 9 months ago

From https://www.rfc-editor.org/rfc/rfc8949.html#section-3.2.3:

If any definite-length text string inside an indefinite-length text string is invalid, the indefinite-length text string is invalid. Note that this implies that the UTF-8 bytes of a single Unicode code point (scalar value) cannot be spread between chunks: a new chunk of a text string can only be started at a code point boundary.

Currently, when ValidateUnicode is set, the indefinite-length string is validated as UTF-8 only after all chunks have been concatenated. I have a test that spreads one code point across two chunks here: https://github.com/benluddy/ugorji-go/commit/c38a86cde35370b6b00be0e15406e12593c95ee4, which fails with:

--- FAIL: TestCborIndefiniteLengthTextStringChunksAreUTF8 (0.00s)
    cbor_test.go:126: expected error but decoded to: "£"
ugorji commented 8 months ago

Fixed with f7f63a0a821cb85bc908002b89754aa954ed76ea

ugorji commented 8 months ago

Fixed with f7f63a0a821cb85bc908002b89754aa954ed76ea