splunk / splunk-sdk-python

Splunk Software Development Kit for Python
http://dev.splunk.com
Apache License 2.0
698 stars 370 forks source link

Splunk SDK "output_mode : json" - decode('utf-8', 'xmlcharrefreplace'), match) #285

Closed sidsinhad closed 2 years ago

sidsinhad commented 5 years ago

I am trying to export splunk result into json format using splunk sdk. Below is the code I am using, this works when output_mode is csv, but when I use json, it fails with the error mentioned below.

       job = service.jobs.create(searchquery, **{"exec_mode": "blocking",
                                                  "earliest_time": default_timeline,
                                                  "latest_time": "now",
                                                  "output_mode": "json",
                                                  "maxEvents": 10000000})
        offset = 0;
        count = 10000;
        thru_counter = 0
        resultCount = int(job["resultCount"])

        if rescount == 0:
            print "No Results Found for the above searchquery"
            return False

        while (offset < rescount):
            kwargs_paginate = {"count": count, "offset": offset, "output_mode": "json"}
            rs = job.results(**kwargs_paginate)
            output = rs.read()
            print rs.read() 

Below error:

    "maxEvents": 10000000})
  File "/Library/Python/2.7/site-packages/splunklib/client.py", line 2944, in create
    sid = _load_sid(response)
  File "/Library/Python/2.7/site-packages/splunklib/client.py", line 228, in _load_sid
    return _load_atom(response).response.sid
  File "/Library/Python/2.7/site-packages/splunklib/client.py", line 203, in _load_atom
    .decode('utf-8', 'xmlcharrefreplace'), match)
  File "/Library/Python/2.7/site-packages/splunklib/data.py", line 85, in load
    root = XML(text)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/xml/etree/ElementTree.py", line 1311, in XML
    parser.feed(text)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/xml/etree/ElementTree.py", line 1659, in feed
    self._raiseerror(v)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/xml/etree/ElementTree.py", line 1523, in _raiseerror
    raise err
ParseError: not well-formed (invalid token): line 1, column 0

I updated the sdk version to 1.6.11, still the same.

kashi333 commented 5 years ago

ok.

ghost commented 4 years ago

_load_sid(response) does a simple _load_atom(response), assuming that everything is XML. However, job creation follows the output_mode and the response is actually JSON in this case, eg: b'{"sid":"1574422392.8448_AAB56AC3-E7E0-4CF6-A072-3CB90850813D"}'

I've hacked around it for now myself by doing:

def _load_sid(response, output_mode="xml"): if output_mode.lower().startswith('json'): response_json = response.body.read().decode("utf-8") return json.loads(response_json)["sid"] return _load_atom(response).response.sid

and changing the two call sites to: sid = _load_sid(response, output_mode=kwargs.get('output_mode', 'xml'))

ghost commented 4 years ago

And then realized that the results() call comes back with XML by default. So I'd recommend: don't change the code for the search; specify output_mode='json' when calling job.results(output_mode='json')

ashah-splunk commented 2 years ago

@sidsinhad we have addressed this issue and the fix will be available in the next release. PR for reference :- https://github.com/splunk/splunk-sdk-python/pull/447 Please let us know if you are still facing the issue.

ashah-splunk commented 2 years ago

@sidsinhad we would request you to use the latest SDK release. We have implemented the fix and is available in the latest SDK release. Please let us know if you still face the issue.