microsoft / jupyter-Kqlmagic

Extension (Magic) to Jupyter notebook and Jupyter lab, that enable notebook experience working with Kusto, ApplicationInsights, and LogAnalytics data.
Other
85 stars 31 forks source link

KQLMagic silently truncates records returned without warning #77

Open JPvRiel opened 3 years ago

JPvRiel commented 3 years ago

While attempting a large query, KQLMagic silently truncates records and it is unclear if some limit, e.g. data size or timeout might be causing the truncation.

Given the following adjustment to defaults, with enable_suppress_result and timeout left on the default:

%env KQLMAGIC_CONFIGURATION='auto_limit=1000000;display_limit=100'

And then a parameterized limit:

kql_parms = {
    'p_limit': 1000000
}

Followed by a Kusto query that should be able to return 1,000,000 records, but not output them (trailing semi-colon):

%%kql -params_dict kql_parms
let _limit_ = p_limit;
ThreatIntelligenceIndicator
| limit _limit_
;

Only a few more than 100K records were returned, but the ThreatIntelligenceIndicator does have more than 1 million records.

Screenshot:

KQLMagic query with large limit has records truncated

_kql_raw_result_.feedback_warning was an empty list. This is the main part of the issue. It may be reasonable to enforce resource limits, but at least provide feedack/warnings when doing so.

I attempted to inspect _kql_raw_result_.completion_query_info, but it was an empty JSON object, and is the only hint that something went wrong and the query did not actually complete.

{}

I inspected _kql_raw_result_.completion_query_resource_consumption

{
    "ExecutionTime": 9.1875081,
    "dataset_statistics": [
        {
            "table_row_count": 114138,
            "table_size": 66346649
        }
    ],
    "input_dataset_statistics": {
        "extents": {
            "scanned": 135,
            "scanned_max_datetime": "2021-09-27T10:34:33.0395942Z",
            "scanned_min_datetime": "2021-07-12T06:48:28.3176520Z",
            "total": 135
        },
        "rows": {
            "scanned": 142882578,
            "total": 142882578
        },
        "rowstores": {
            "scanned_rows": 0,
            "scanned_values_size": 0
        },
        "shards": {
            "queries_generic": 0,
            "queries_specialized": 0
        }
    },
    "resource_usage": {
        "cache": {
            "disk": {
                "hits": 0,
                "misses": 0,
                "total": 0
            },
            "memory": {
                "hits": 0,
                "misses": 0,
                "total": 0
            },
            "shards": {
                "bypassbytes": 0,
                "cold": {
                    "hitbytes": 0,
                    "missbytes": 0,
                    "retrievebytes": 0
                },
                "hot": {
                    "hitbytes": 0,
                    "missbytes": 0,
                    "retrievebytes": 0
                }
            }
        },
        "cpu": {
            "kernel": "00:00:00.2343750",
            "total cpu": "00:00:00.2500000",
            "user": "00:00:00.0156250"
        },
        "memory": {
            "peak_per_node": 3565520
        }
    }
}

From the above, it's unclear if any memory or other such resource limits are hit and causing any problem.

When I ran a smaller query limit, e.g. just 10K instead of 1M:

%%kql
ThreatIntelligenceIndicator
| limit 10000
;

Then _kql_raw_result_.completion_query_info is not an empty JSON object:

{
    "Count": 4,
    "StatusCode": 0,
    "StatusDescription": "Query completed successfully"
}

Perhaps the connection/paging of data has some limit, or the back-end resources have some limit, but no warning or feedback is provided in the output of the notebook to make it clear records have been truncated and why.

JPvRiel commented 3 years ago

To add context, I suspect the Azure Log Analytics API limits might be a cause. As per: https://dev.loganalytics.io/documentation/Using-the-API/Limits:

As well as call rate limits and daily quota caps, there are also limits on queries themselves:

  • Queries cannot return more than 500,000 rows
  • Queries cannot return more than 64,000,000 bytes (~61 MiB total data)
  • Queries cannot run longer than 10 minutes by default. See this for details.

https://docs.microsoft.com/en-us/azure/data-explorer/kusto/concepts/querylimits#limit-on-result-set-size-result-truncation also explains it well.

KQLMagic should be enhanced to detect when one of the above limits has impacted the query response and warn the user.

Ideally, an option to force a larger query by chunking/paging via multiple requests would be nice.

JPvRiel commented 3 years ago

Tested and checked, seems the 3rd table returned by the API contains information about the issue, e.g. _kql_raw_result_._json_response['Tables'][2] has a E_QUERY_RESULT_SET_TOO_LARGE value:

{'TableName': 'Table_2',
 'Columns': [{'ColumnName': 'Timestamp',
   'DataType': 'DateTime',
   'ColumnType': 'datetime'},
  {'ColumnName': 'Severity', 'DataType': 'Int32', 'ColumnType': 'int'},
  {'ColumnName': 'SeverityName', 'DataType': 'String', 'ColumnType': 'string'},
  {'ColumnName': 'StatusCode', 'DataType': 'Int32', 'ColumnType': 'int'},
  {'ColumnName': 'StatusDescription',
   'DataType': 'String',
   'ColumnType': 'string'},
  {'ColumnName': 'Count', 'DataType': 'Int32', 'ColumnType': 'int'},
  {'ColumnName': 'RequestId', 'DataType': 'Guid', 'ColumnType': 'guid'},
  {'ColumnName': 'ActivityId', 'DataType': 'Guid', 'ColumnType': 'guid'},
  {'ColumnName': 'SubActivityId', 'DataType': 'Guid', 'ColumnType': 'guid'},
  {'ColumnName': 'ClientActivityId',
   'DataType': 'String',
   'ColumnType': 'string'}],
 'Rows': [['2021-09-28T11:35:51.1229415Z',
   2,
   'Error',
   -2133196797,
   'Query result set has exceeded the internal data size limit 64000000 (E_QUERY_RESULT_SET_TOO_LARGE; see https://aka.ms/kustoquerylimits)',
   1,
   'ee8842dc-a93b-4b17-a39f-815a984b4592',
   'ee8842dc-a93b-4b17-a39f-815a984b4592',
   '048f578f-4c35-4c58-a14e-75bb0b19ca55',
   '65351ce2-c3b4-4e72-97de-e02152a5af26'],
  ['2021-09-28T11:36:00.7791658Z',
   6,
   'Stats',
   0,
   '{"ExecutionTime":12.4531257,"resource_usage":{"cache":{"memory":{"hits":0,"misses":0,"total":0},"disk":{"hits":0,"misses":0,"total":0},"shards":{"hot":{"hitbytes":0,"missbytes":0,"retrievebytes":0},"cold":{"hitbytes":0,"missbytes":0,"retrievebytes":0},"bypassbytes":0}},"cpu":{"user":"00:00:00","kernel":"00:00:00.0312500","total cpu":"00:00:00.0312500"},"memory":{"peak_per_node":19669488}},"input_dataset_statistics":{"extents":{"total":526,"scanned":526,"scanned_min_datetime":"2021-07-12T06:54:40.8842562Z","scanned_max_datetime":"2021-09-28T11:35:38.7088527Z"},"rows":{"total":107210439,"scanned":107210439},"rowstores":{"scanned_rows":0,"scanned_values_size":0},"shards":{"queries_generic":0,"queries_specialized":0}},"dataset_statistics":[{"table_row_count":3672,"table_size":64179555}]}'
}

The log analytics API doesn't seem to have an easy obvious error attribute and instead it has to be fished out of the 3rd data table that contains information about the query execution.

mbnshtck commented 3 years ago

Thank you for the report. Will add in next version, notification/flag if response is truncated

JPvRiel commented 3 years ago

@mbnshtck thanks for the ACK.

Maybe just like there's _kql_raw_result_.feedback_info and _kql_raw_result_.feedback_warning, perhaps an _kql_raw_result_.feedback_error could hold extended error details?