google / zetasql

ZetaSQL - Analyzer Framework for SQL
Apache License 2.0
2.28k stars 214 forks source link

Analyzer extractTableNamesFromStatement - returning incorrect table names #123

Open jrbangit opened 1 year ago

jrbangit commented 1 year ago

I'm using ZetaSQL in our streaming application that basically consumes from GCP audit log. As you can imagine, there are lots possibilities that a weird query will occur and flow in the audit log and eventually into our application.

I've encountered an issue specifically on the extractTableNamesFromStatement method of Analyzer which only occurs on specific type of query. Particularly for queries having the same table name for both target and source, like the query below.

CREATE OR REPLACE TABLE project_test.dataset_test.test_table AS SELECT DISTINCT * FROM project_test.dataset_test.test_table

the expected result based on my tests with various types of statements should return [project_test.dataset_test.test_table]

However, in the query above, the analyzer returns [project_test, dataset_test, test_table] instead.

here's how I used it:

        LanguageOptions languageOptions = new LanguageOptions();
        languageOptions.enableMaximumLanguageFeatures();
        languageOptions.setSupportsAllStatementKinds();

        AnalyzerOptions analyzerOptions = new AnalyzerOptions();
        analyzerOptions.setLanguageOptions(languageOptions);
        analyzerOptions.setPruneUnusedColumns(true);

        Analyzer.extractTableNamesFromStatement(query, analyzerOptions)
                    .stream()
                    .flatMap(List::stream)
                    .collect(Collectors.toList());