microsoft / Kusto-Query-Language

Kusto Query Language is a simple and productive language for querying Big Data.
Apache License 2.0
562 stars 104 forks source link

Strange interaction where the "where" clause does not filter out a null value. #148

Closed tbethe closed 2 months ago

tbethe commented 2 months ago

I have an example where an event is not filtered out, when I argue it should be. Any explanation of why this is normal behaviour would also be amazing. I am confused.

Use-case: Check if there was a new row in the table in the last 2 hours.

To demonstrate, I create an empty table, summarize it based on max TimeGenerated. However, if the table is empty, summarize still generates an event/row, but it is filled with only NULLs.

However, when I then filter for only events from max 2 hours ago, the event with all columns filled with NULLs is still returned!

The documentation of the where clause is very clear in that when the column is NULL, these are not returned. I quote from the docs:

Notes Null values: all filtering functions return false when compared with null values. You can use special null-aware functions to write queries that handle null values.

In this example, we get an event, with empty (NULL) columns. This is strange to me.

let EmptyTable = datatable(TimeGenerated: datetime, SomeField: guid)[];
EmptyTable
| summarize arg_max(TimeGenerated, *)
| where TimeGenerated >= ago(2h)

To double check, I did the following: If I first extend the boolean condition and then put it in the where clause, it is filtered. This feels arbitrary now.

let EmptyTable = datatable(TimeGenerated: datetime, SomeField: guid)[];
EmptyTable
| summarize arg_max(TimeGenerated, *)
| extend time_gen_ago2h = TimeGenerated >= ago(2h)
| where time_gen_ago2h
mattwar commented 2 months ago

Moving this issue to discussion as it is not about this library but the behavior of the query engine.