microsoft / Kusto-Query-Language

Kusto Query Language is a simple and productive language for querying Big Data.
Apache License 2.0
510 stars 97 forks source link

Finding all the database table column names skips returning the columns names. #87

Closed DeepakRautela02 closed 1 year ago

DeepakRautela02 commented 1 year ago

The code snippet in the Readme.md for finding all the database table column names doesn't report some of the column names present in the query and in the schema. For eg.

DeviceInfo
| summarize arg_max(Timestamp, DeviceName) by DeviceId
| extend DeviceMachineName = split(DeviceName, '.')[0]

Here on line 2, the logic is able to detect Timestamp, DeviceName, DeviceId columns, but fails to report the DeviceName column in the 3rd line. We found out that the Kusto.Globals.GetTable(c), returned null in that case.

We also observed that this behavior mainly happens after the summarize is used.

Will you be able to take this up? Thank you

mattwar commented 1 year ago

Yes, I'll try to find a solution for this.

mattwar commented 1 year ago

I've made changes so that the second reference to DeviceName should now be considered the same as the original column from the database.

Note: This will not always be true with operators that transform the data/columns. For example, if you also included arg_min(Timestamp, DeviceName) in the summarize expressions, then the output from summarize would have both DeviceName and DeviceName1. Technically, both are data from the original database column, but only the first one will reuse the original column symbol.

DeepakRautela02 commented 1 year ago

Hi Matt, We are still seeing the same issue. Can you please check once.

DeepakRautela02 commented 1 year ago

Also, with the new update in the parser, we are seeing some more errors where now queries like

union DeviceProcessEvents , DeviceNetworkEvents , DeviceFileEvents , DeviceImageLoadEvents 
| where Timestamp > ago(30d)

the parser isn't able to identify the Timestamp column too. PS. We are using the latest 11.0.0 version, is that the latest stable version.

evgeni-nabokov commented 1 year ago

Hi,

For the same column name I get different SyntaxNode instances with different hash codes, and for one node kustoCode.Globals.GetTable(node) returns a table instance while for the other node it returns null.

I tested using 11.2.0 version against a query without summarize, but with join kind=inner and the "buggy" column goes after on keyword.

mattwar commented 1 year ago

There are columns that get introduced during analysis that represent one or more other columns. You will find these when one column in the source actually refers to more than one actual column. For example, columns referenced in queries after a union operator, or when a join operator's on-clause references a common column name from both sides. This also happens when a column in a query is renamed. It is represented by a new column instance that refers back to the original column. You can find these original columns in the ColumnSymbol's OriginalColumns property.

I'm currently in the process of updating the readme to fix the examples and explain this all.

In the mean time, this functionality is available on nuget so you don't have to rely on the examples: https://www.nuget.org/packages/Kusto.Toolkit

mattwar commented 1 year ago

The examples in the readme have been updated.