nodejs / node

Node.js JavaScript runtime ✨🐢🚀✨
https://nodejs.org
Other
107.88k stars 29.73k forks source link

TextDecoder: ERR_ENCODING_INVALID_ENCODED_DATA on very long array buffer #47645

Open martian17 opened 1 year ago

martian17 commented 1 year ago

Version

v18.14.1

Platform

Linux 5.19.0-38-generic #39~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Fri Mar 17 21:16:15 UTC 2 x86_64 x86_64 x86_64 GNU/Linux

Subsystem

No response

What steps will reproduce the bug?

When I try to decode a long utf-16le encoded buffer, ERR_ENCODING_INVALID_ENCODED_DATA is thrown instead of ERR_STRING_TOO_LONG.

new TextDecoder("utf-16le").decode(new Uint16Array(2**27).fill(48))
// Uncaught TypeError: The encoded data was not valid for encoding utf-16le
//     at TextDecoder.decode (node:internal/encoding:448:14) {
//   code: 'ERR_ENCODING_INVALID_ENCODED_DATA'
// }

The default encoding version seems to work correctly, and throws an appropriate error

new TextDecoder().decode(new Uint8Array(2**29).fill(48))
// Uncaught Error: Cannot create a string longer than 0x1fffffe8 characters
//     at TextDecoder.decode (node:internal/encoding:433:16) {
//   code: 'ERR_STRING_TOO_LONG'
// }

Another thing that I realized is that TextDecoder() seems to be capable of consuming an array buffer twice as long as TextDecoder("utf-16le") without throwing error, and produce a string that's 4 times as long.

How often does it reproduce? Is there a required condition?

Confirmed this bug in both normal file execution and node.js repl

What is the expected behavior? Why is that the expected behavior?

new TextDecoder("utf-16le") should be able to create a string up to 0x1fffffe8 characters. It should throw ERR_STRING_TOO_LONG when this length is exceeded.

What do you see instead?

ERR_ENCODING_INVALID_ENCODED_DATA is thrown when the input Uint16Array length is 2**27

Uncaught TypeError: The encoded data was not valid for encoding utf-16le
    at TextDecoder.decode (node:internal/encoding:448:14) {
  code: 'ERR_ENCODING_INVALID_ENCODED_DATA'
}

Additional information

No response

martian17 commented 1 year ago

On Google Chrome, TextDecoder with encoding "utf-16le" seems to be able to parse Uint16Array with the size up to around 2**29-100. Node should be capable of this as well.