Closed yaleman closed 2 years ago
Hmm. Just for clarification:
If I'm just issuing (non-export, just regular) querys via the splunk sdk and getting a job back
job = self.service.jobs.create(search_query, **search_params)
current_results = job.results(output_mode='json', count=50000, offset=0)
reader = results.JSONResultsReader(current_results)
The API returns a dict containing messages = [...] and results [...], is_preview and so on.
Your implementation of
def _parse_results(self, stream: BufferedReader):
"""Parse results and messages out of *stream*."""
for line in stream.readlines():
event = json_loads(line)
if "preview" in event:
self.is_preview = event["preview"]
if "msg" in event:
msg_type = event.get("type", "Uknown Message Type")
text = event.get("text")
yield Message(msg_type, text)
yield event
Would not return single events. The "single" event would consist of arrays (results) containing the 50.000 events. This is rather different from
current_results = job.results(output_mode='xml', count=50000, offset=0)
reader = results.ResultsReader(io.BufferedReader(ResponseReaderWrapper(current_results)))
for result in reader:
if isinstance(result, dict):
log.info("Result: %s" % result)
elif isinstance(result, results.Message):
log.info("Message: %s" % result)
log.info("is_preview = %s " % reader.is_preview)
which would print 50.000 log lines. Above solution would log 1 time, giant dict with key results containing 50.000 events
This is more of what I would expect:
def _parse_results(self, stream: BufferedReader):
"""Parse results and messages out of *stream*."""
for line in stream.readlines():
bulk = json_loads(line)
if "preview" in bulk:
self.is_preview = bulk["preview"]
for message in bulk.get('messages', []):
msg_type = message.get("type", "Uknown Message Type")
text = message.get("text")
yield Message(msg_type, text)
for event in bulk.get('results', []):
yield event
I think 78079ae updates the code to something that should work.
I've updated my test code over in https://github.com/yaleman/splunk-sdk-games so that seems to work too.
Running test_file_jsonreader.py
RESULT_COUNT=113624
MESSAGE_COUNT=0
PREVIEW_COUNT=12500
real 0m1.034s
user 0m0.873s
sys 0m0.089s
Running test_file_jsonreader_create.py
RESULT_COUNT=113624
MESSAGE_COUNT=12
PREVIEW_COUNT=0
real 0m1.035s
user 0m0.837s
sys 0m0.140s
Running test_file_resultsreader.py
RESULT_COUNT=113624
MESSAGE_COUNT=1
PREVIEW_COUNT=25000
real 2m17.480s
user 2m16.443s
sys 0m0.706s
If someone can provide a way of testing it in docker or something that'd be cool but I think it works.
I can't run the built-in tests because python 3.7 on an m1 macBook doesn't have _ctypes
and I really don't want to spend more hours building VMs to test a library I don't even use.
@yaleman thanks for the PR. We are considering the provided suggestion to use JSONResultsReader and exploring more on the same.
@yaleman we have considered your changes for JSONResultsReader and have added some modifications as well which are now available in the latest Python SDK release 1.6.19.
results.ResultsReader
is slow because it's iterating byte-by-byte through the stream to parse the XML in a way the chosen parser will be happy. I've added JSONResultsReader to provide a much more performant option.Benefits:
Other changes in this commit
Test Data
Running the tests from https://github.com/yaleman/splunk-sdk-games/
(you can just clone the repo, configure it and run
./run_tests.sh
)Generating the local files, so that we're not testing the response of my Splunk instance:
Using
results.ResultsReader
and the job in XML output format:Using
results.JSONResultsReader
and the job in JSON output format:The 1 "missing" result is the
Message
that the JSON export endpoint doesn't return.