Linux 5.19.0-38-generic #39~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Fri Mar 17 21:16:15 UTC 2 x86_64 x86_64 x86_64 GNU/Linux
Subsystem
No response
What steps will reproduce the bug?
When I try to decode a long utf-16le encoded buffer, ERR_ENCODING_INVALID_ENCODED_DATA is thrown instead of ERR_STRING_TOO_LONG.
new TextDecoder("utf-16le").decode(new Uint16Array(2**27).fill(48))
// Uncaught TypeError: The encoded data was not valid for encoding utf-16le
// at TextDecoder.decode (node:internal/encoding:448:14) {
// code: 'ERR_ENCODING_INVALID_ENCODED_DATA'
// }
The default encoding version seems to work correctly, and throws an appropriate error
new TextDecoder().decode(new Uint8Array(2**29).fill(48))
// Uncaught Error: Cannot create a string longer than 0x1fffffe8 characters
// at TextDecoder.decode (node:internal/encoding:433:16) {
// code: 'ERR_STRING_TOO_LONG'
// }
Another thing that I realized is that TextDecoder() seems to be capable of consuming an array buffer twice as long as TextDecoder("utf-16le") without throwing error, and produce a string that's 4 times as long.
How often does it reproduce? Is there a required condition?
Confirmed this bug in both normal file execution and node.js repl
What is the expected behavior? Why is that the expected behavior?
new TextDecoder("utf-16le") should be able to create a string up to 0x1fffffe8 characters.
It should throw ERR_STRING_TOO_LONG when this length is exceeded.
What do you see instead?
ERR_ENCODING_INVALID_ENCODED_DATA is thrown when the input Uint16Array length is 2**27
Uncaught TypeError: The encoded data was not valid for encoding utf-16le
at TextDecoder.decode (node:internal/encoding:448:14) {
code: 'ERR_ENCODING_INVALID_ENCODED_DATA'
}
On Google Chrome, TextDecoder with encoding "utf-16le" seems to be able to parse Uint16Array with the size up to around 2**29-100. Node should be capable of this as well.
Version
v18.14.1
Platform
Linux 5.19.0-38-generic #39~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Fri Mar 17 21:16:15 UTC 2 x86_64 x86_64 x86_64 GNU/Linux
Subsystem
No response
What steps will reproduce the bug?
When I try to decode a long utf-16le encoded buffer, ERR_ENCODING_INVALID_ENCODED_DATA is thrown instead of ERR_STRING_TOO_LONG.
The default encoding version seems to work correctly, and throws an appropriate error
Another thing that I realized is that
TextDecoder()
seems to be capable of consuming an array buffer twice as long asTextDecoder("utf-16le")
without throwing error, and produce a string that's 4 times as long.How often does it reproduce? Is there a required condition?
Confirmed this bug in both normal file execution and node.js repl
What is the expected behavior? Why is that the expected behavior?
new TextDecoder("utf-16le")
should be able to create a string up to 0x1fffffe8 characters. It should throw ERR_STRING_TOO_LONG when this length is exceeded.What do you see instead?
ERR_ENCODING_INVALID_ENCODED_DATA is thrown when the input Uint16Array length is 2**27
Additional information
No response