Closed amysutedja closed 4 years ago
Hah, I've been working since yesterday in a change that is nearly identical to this. Nice fix, if I say so myself =D
I'll retire my fork, this is really good.
FYI we are waiting on this change to upgrade the pythonsdk in all our apps ahead of conf. We'd rather not have to do a monkey patch. Please advise when you can merge and release an official build.
Fixes https://github.com/splunk/splunk-sdk-python/issues/288
SearchCommand
supports multibyte characters in Python 3Previously,
SearchCommand
in Python 3 would read directly from the incomingifile
stream -- typicallysys.stdin
. In Python 2sys.stdin
is a file-like byte stream, whereas in Python 3 it is anio.TextIOWrapper
containing an underlying buffer. Because this object reads by character rather than by byte, multibyte characters would cause the command to read too far past the data's boundary. This could lead to corrupt data reads (if early in the stream) or infinite hangs (if at the end of the stream).We now retrieve the underlying buffer and read from it when in Python 3. The read bytes are then cast to strings for parsing purposes.
Tests ensure underlying byte stream
Previously, the tests defined a metadata stream with the
A
character in it (not to be confused withA
). In Python 2, this character caused its containing string to becomeunicode
, which causedStringIO
to gain that encoding. As a result, the size of the metadata stream was always incorrectly measuring Unicode characters rather than bytes, but under test the read logic would always be handed a Unicode character stream rather than a byte stream.This has been fixed.
Multibyte test fixture
We now have a new test
test_multibyte_chunked
which contains a multibyte character test fixture.