The classifier attempts to decode an input buffer as both an Uft32 and Utf16. It takes a count of "characters" to read. In practice, this seems to be bytes to read, and callers tend to pass in the buffer length. This works fine when the byte[] is sized to an array of length equivalent to zero modulo four (the number of bytes in a UTF32 buffer.) But when the alignments are less favorable, it is possible for one encoding to return a false-binary detection, and the other to partially read off the end of the buffer in a way that fails the decoding.
Proposed fix is to have the classifier ensure the decoding is aligned to zero modulo 4.
The classifier attempts to decode an input buffer as both an Uft32 and Utf16. It takes a count of "characters" to read. In practice, this seems to be bytes to read, and callers tend to pass in the buffer length. This works fine when the byte[] is sized to an array of length equivalent to zero modulo four (the number of bytes in a UTF32 buffer.) But when the alignments are less favorable, it is possible for one encoding to return a false-binary detection, and the other to partially read off the end of the buffer in a way that fails the decoding.
Proposed fix is to have the classifier ensure the decoding is aligned to zero modulo 4.