microsoft / jupyter-Kqlmagic

Extension (Magic) to Jupyter notebook and Jupyter lab, that enable notebook experience working with Kusto, ApplicationInsights, and LogAnalytics data.
Other
85 stars 31 forks source link

`raw_json` is a misnomer given it's a special formatted object, not actual raw JSON #78

Open JPvRiel opened 2 years ago

JPvRiel commented 2 years ago

If a user wants to inspect and work with the raw JSON response while debugging, _kql_raw_result_.raw_json does not provide a a raw JSON string.

Instead, it's a FormattedJsonDict object.

Perhaps _kql_raw_result_.raw_json should be the actual raw JSON? And a new attribute _kql_raw_result_.formatted_json would then be a better name for what is actually given.

E.g. I was attempting to understand if Azure log analytics response limits return something in the up to ~61MB JSON response, and it's not practical to view _kql_raw_result_.raw_json at this size, and it breaks the option to use something such as jmespath.

This can't be done.

import json

test = json.loads(_kql_raw_result_.raw_json)

Nor can this:

import jmespath

jmespath.search(
    expression='error',
    data=_kql_raw_result_.raw_json
)

As per:

print([a for a in dir(_kql_raw_result_) if 'json' in a.lower()])

There are two JSON related attributes, but neither are proper JSON that can be debugged as JSON using standard python JSON libs. They're all already preprocessed.

['_json_response', 'raw_json']

Even _json_response is poorly named given it's actually a python dict, not a JSON string. If it was name _response, it would not invoke the assumption that the type is a JSON string.

mbnshtck commented 2 years ago

you are trying to use Kqlmagic objects in a way they were not meant to. Kqlmagic main purpose is to query data stored in Kusto/ADX, Application Insights, and Log Analytics using the KQL query language within a notebook. The intergration with Python is limited to the result tables as dictionary or dataframes, plus some extras. raw_json was not designed to be used for programming. Its purpose is only to show the raw result.

I will consider to refactor those properties in next version, in a way, that they can be used programmatically by python code.

JPvRiel commented 2 years ago

@mbnshtck thanks for the reply.

Fully appreciate that it's abnormal to need to to dig into raw_json and typical usage will be using a pandas dataframe, etc, so this won't affect many users.

The pretty formatting object and lexer feature overlaid on the JSON is nice when someone want's to view a small result/API response. Just noting that the name threw me off and it's impractical for debugging exactly what happened with a large JSON API response.

_json_response in the end was good enough for me to debug what was going wrong (e.g. how I found the detail of why my dateframe was not fully populated due to API limits as per https://github.com/microsoft/jupyter-Kqlmagic/issues/77).

Your call if you want to risk maybe changing var names to be a bit more intuitive (could be breaking change for some notebooks), although people probably don't commonly use these for much other than debugging...

Since it would be for advanced debug use, it could also maybe be somewhat hidden as a private var I suppose? E.g.:

_raw_json or _raw_json_response can be the unformulated pure JSON. Then raw_json and _json_response remains as they are (no breaking change).

Once you settle on which bits make good handles to debug with, I'd be happy to contribute a PR on docs on how to debug a bit given the raw response versus what ends up in the dataframe.