splunk / splunk-sdk-python

Splunk Software Development Kit for Python
http://dev.splunk.com
Apache License 2.0
687 stars 369 forks source link

Issues with importing Splunk JSON into Pandas Dataframe #509

Closed jeremynsl closed 1 year ago

jeremynsl commented 1 year ago

Describe the bug Hi. I'm getting started with the SDK and trying to run a oneshot search and convert that returned JSON into a Pandas dataframe. I think this must be a fairly common use-case but I'm having issues.

I searched the docs and the examples for Pandas usage and only found it once, where it seems the "JSON" output is iterated over and each row loaded into a list. This can be imported into Pandas but all the columns information is lost.

To Reproduce

kwargs_oneshot = {"earliest_time": "-60m",
                    "latest_time": "now",
                    "output_mode": 'json', "count": 0}
    oneshotsearch_results = jobs.oneshot(
        searchquery_oneshot, **kwargs_oneshot)
reader = results.JSONResultsReader(oneshotsearch_results)
df = pd.read_json(reader)

Also tried df = pd.json.normalize(reader)

Results in either: ValueError: Invalid file path or buffer object type: <class 'splunklib.results.JSONResultsReader'>

or for json_normalize: Empty DataFrame Columns: [] Index: []

Also tried importing json module and doing json.loads(reader)

result = TypeError: the JSON object must be str, bytes or bytearray, not JSONResultsReader

Expected behavior It doesn't seem like the json format Splunk is using is compliant with what other modules are expecting. I am expecting either Pandas or the json module to be able to import this data.

Splunk (please complete the following information): Splunk Enterprise Version: 9.0.2

SDK (please complete the following information): Latest SDK Python 3.11 Windows 10

ashah-splunk commented 1 year ago

Hi @jeremynsl, i guess the name JSONResultsReader is creating some confusion and we are sorry for that. But this is a ResultsReader that works on the JSON stream and not a reader that returns a JSON Blob. We will look into renaming it to avoid such confusion. Currently we don't have any plans to support Pandas or similar JSON modules.