microsoft / Kusto-Query-Language

Kusto Query Language is a simple and productive language for querying Big Data.
Apache License 2.0
510 stars 97 forks source link

Not able to get all tables when query is long #71

Closed poppy0708 closed 2 years ago

poppy0708 commented 2 years ago

Hi, I follow this guide find all the database tables referenced in a query to extract tables.

While I found an interesting thing that it seems the query length has an impact on the result. I try for a query with length over 200 and 4 reference table. The parsing results only return 2 tables, the rest 2 tables location are behind the 100th characters .

Is there a limitation with the query length or query complexity? Thank you!

mattwar commented 2 years ago

There are some limits such as nested query depth that will get the analyzer to abort and not provide any information. There is no such limit that would cut off analysis after 200 characters. There is probably something else that is going wrong. Are you sure you are parsing the expected query? Can you provide the query that is misbehaving?

poppy0708 commented 2 years ago

Hello Matt, Thank you for your quick reply. What is the nested query depth limitation? Do you mean the query call a function and the function include function and tables? I don't think my issue is due to nested query, it is a very long query then directly join with tables. The GetDatabaseTables feature works well when query length is short. Having problem when it is long. I testes for many queries working on our Kusto cluster. Is there any way that I can show you my query confidentially? I don't think it is appropriate to post my working code to public. Thank you!

mattwar commented 2 years ago

Can you reduce the problem to a sample query & code that reproduces it without showing your real work?

poppy0708 commented 2 years ago

Hello Matt,

Here is how my code looks like:

let sliceStartTS = datetime('2022-04-01T00:00:00');
let timeWindowInMin = 1440;
let sliceEndTS=datetime_add('minute', timeWindowInMin, sliceStartTS);
// query_ActiveCIds is a function, it is querying tableA.
let activeCIds = database('db1').query_ActiveCIds(sliceStartTS, sliceEndTS);
let cSnapshot = 
      cluster('cluster2').database("db2").TableB
      | where ingestion_time() < sliceEndTS
      | project SnapshotTime = todatetime(SnapshotTime), LXXXXX, SXXXX, CXXXX, PXXXXXXXXX, RXXXXX, RXXXXXXXX, SXXXXXX, DXXXXXXXXXXXXXXXX,TXXXXXXXXX, SXXXXXXXXXXXX, AXXXXXXXXX, AXXXXXXXXXX
     | join kind = inner (
        cluster('cluster2').database("db2").TableB  
        | where ingestion_time() < sliceEndTS
        | project LXXXXXXXXX, SXXXXXXXXXXXX
        | extend SnapshotTS = todatetime(SnapshotTime)
        | summarize SnapshotTime=max(SnapshotTS) by LXXXXXXXXX
        ) on SnapshotTime, LXXXXXXXXX
    | extend RXXXXX=tolower(replace_string(LXXXXXXXXX, " ", ""))
    | extend CXXXXXXX = DXXXXXXX
    | project-away DXXXXXXXXX, LXXXXXXXXX, LXXXXXXXXXX1, SnapshotTime1
    | extend SD = extract_all('Name=(\\w+),\\s*SV=(\\d+),\\s*VXXXXXXX=(\\d+)', SXX)[0]
    | extend SXXX=tostring(SD [0]), SXXXX=toint([1]), SV=tostring(SD[2])
    | project SnapshotId, SnapshotTime,SXXXXXXXXXXXX,TXXXXXXX, AXXXXXXXXXXXX, AXXXXXXXX, RXXXXXXX, CXXXXXXAt=todatetime(CXXXXXXX), PXXXXXXXXXXXX, RXXXXXXXXX, SXXXXXXXXX, SXXXXXXXX, SXXXXXXX, CXXXXXXXXX, RXXXXXXXXX
    | summarize arg_max(SnapshotTime, *) by CXXXXXXX;
cSnapshot 
    | join kind = leftouter ( TableC) on $left.SXXXXXXXXId == $right.CSXXXXXXId
    | join kind= leftouter (cluster('cluster2').database("db2").function1) on $left.RXXXXXX== $right.SRXXXXX    
    | extend RXXXXX = HRXXXXXXXXXX
    | join kind= leftouter (cluster('cluster2').database("db2").TableD| distinct SXXXXXXXXXXX | project SXXXXXXXXXXX=tostring(SXXXXXXXXXId), IsXXXXXXX = 1) on SXXXXXXXXXXXId
    | join kind = leftouter (activeCIds|extend IsActive=1) on CId

TableA and TableB is able to be returned. TableC and TableD not.

mattwar commented 2 years ago

There is a syntax error on this line of the query:

| extend SXXX=tostring(SD [0]), SXXXX=toint([1]), SV=tostring(SD[2])

The parser fails on the plain bracketed number: [1]
Was this supposed to be SD[1] ? The rest of the rest of the query after this error is not recognized so the table finder function does not visit it.

You can check for syntax errors using the GetSyntaxDiagnostics() API on the KustoCode instance after it is parsed.

With the syntax error in the query fixed, the function finds all the tables.

poppy0708 commented 2 years ago

Hello Matt, Thanks for figuring out my typo. Sorry for the misleading. In my real query, there is no syntax error. Have verified from running through Kusto explorer also checked GetSyntaxDiagnostics(). I have contacted you through teams and email. Let me know if you want a call for this problem. Thank you!