Closed cui134 closed 4 years ago
When previous output contains non-ascii character, it seems this StreamingCommand blocked in following code "ifile.read(body_length)"
try:
if body_length > 0:
body = ifile.read(body_length)
except Exception as error:
raise RuntimeError('Failed to read body of length {}: {}'.format(body_length, error))
in function "_read_chunk " in class SearchCommand.
When I use search command protocol version 1, this issue is not replicate
[testcommand]
filename=testcommand.py
enableheader = true
outputheader = true
requires_srinfo = true
stderr_dest = message
supports_getinfo = true
supports_rawargs = true
supports_multivalues = true
I actually JUST encountered this issue! It took me forever to narrow down what was happening. I'm not sure if it's a problem with the sdk so much as it is with splunkd.
I have a command I'm doing a POC with, and I tested it with:
| makeresults | eval string="hey guys" | formatasjson string
which comes through stdin as:
chunked 1.0,36,66
{"action":"execute","finished":true}
"_time",string,"__mv__time","__mv_string"\n1590694186,"hey guys",,
But then when I run the same search as with a non-ascii character replacement:
| makeresults | eval string="héy guys" | formatasjson string
I get an +1 content length from the header:
chunked 1.0,36,67
{"action":"execute","finished":true}
"_time",string,"__mv__time","__mv_string"\n1590694057,"héy guys",,
The sdk (quite naturally) hangs at:
splunklib/searchcommands/search_command.py:
888. try:
889. ....if body_length > 0:
890. ........body = ifile.read(body_length)
Because it's waiting for that 67th character that will never come.
Right now I'm just trying to make a band-aid approach to monitor for a "deceptive header." I'm not sure yet if it's one-off for every one non-ascii character, or if it's just one-off period. This is just an uninformed guess, but I'm wondering if it has to do with Splunk using Python 2 (which requires the "u" prefix before a unicode string). That's the only single-character difference I can think of between ascii and unicode at the moment, but I wouldn't know why Splunk would be using a repr
operation on a string.
--EDIT-- Unfortunately there does seem to be +1 on the body length for each unicode character...
0 Unicode Characters:
| makeresults | eval string="hey guys" | formatasjson string
Returns an accurate 66 character body length:
chunked 1.0,36,66
{"action":"execute","finished":true}
"_time",string,"__mv__time","__mv_string"\n1590694186,"hey guys",,
1 Unicode Character:
| makeresults | eval string="héy guys" | formatasjson string
Returns a +1 character (67) body length:
chunked 1.0,36,67
{"action":"execute","finished":true}
"_time",string,"__mv__time","__mv_string"\n1590694057,"héy guys",,
2 Unicode Characters:
| makeresults | eval string="héy güys" | formatasjson string
Returns a +2 character (68) body length:
chunked 1.0,36,68
{"action":"execute","finished":true}
"_time",string,"__mv__time","__mv_string"\n1590695753,"héy güys",,
I actually JUST encountered this issue! ... ... I'm just trying to make a band-aid approach ...
This worked for me:
splunklib/searchcommands/search_command.py:
888. try:
- 889. ....if body_length > 0:
+ 889. ....body_saved = body_length
- 890. ........body = ifile.read(body_length)
+ 890. ........while body_saved > 0:
+ 891. ............char = ifile.read(1)
+ 892. ............body_saved -= 2 if char.encode('ascii', errors='ignore') == b'' else 1
+ 893. ............body += char
Sorry that's not very copy/paste friendly. Basically I just changed the ifile.read(body_length)
line (and the preceding line) to:
<EDITED>
body_saved = body_length
while body_saved > 0:
char = ifile.read(1)
body_saved -= len(bytes(char, encoding='utf-8'))
body += char
</EDITED>
After some testing, I was able to determine that Splunk will consistently add +1 to the body_length value in the header for every non-ascii character. (ascii_char = 1, unicode_char = 2) When I made this fix it became stable and functional.
I can't think of anything that doesn't involve reading it in character by character. I played around with a timeout, but even that needed to read it in one at a time. There's not one-size fits all timeout that ends up being more performant than reading it in that way. (And a read(length) timeout that fails would have to fall back on character by character at that point regardless)
My personal next step is to try and open up a question with Splunk itself on fixing their header bug, but with this I can move on developing my app at least.
EDITED (NOTE)
More research, more tests, and I realized that the length discrepancy has to do with the bytes required per character in the body. Simple ascii characters are obviously a single byte, whereas unicode characters are typically 2 bytes. There are, however, characters like 基
that are three bytes. I changed the code to evaluate the byte-length coming through each character in the body. This (hopefully) will be an unnecessary step when Splunk goes to Python3, but this particular tweak makes me feel better because it seems to attack the true cause, so it should be the most stable strategy for now.
Just spent way too long debugging this myself.
The issue is that Splunk platform sends the header in BYTES, but splunklib uses it to read from a TextIOWrapper, which takes CHARS.
So characters that are more than 1 byte (can be 2 or 3 bytes or more) cause the body_length sent to splunklib to be longer in BYTES than in CHARS. Then when splunklib does .read(), it ought to hit the EOF and return, but it doesn't and just hangs forever.
Sample string that triggers the bug: "Hi 接" It's 4 characters, but 6 bytes. Splunk sends a body_length of 6. Splunklib tries to read 6 CHARACTERS from the stdin (TextIOWrapper), but hangs because there are only 4 characters. The SPL pipeline hangs and uses its memory and disk space forever.
Environment:
When I create StreamingCommand "testcommand" with following code:
testcommand will hang when I use following SPL: sourcetype=XXX| search url = "http://例子.卷筒纸" | testcommand
There has follwing error log in splunkd.log: