I'm using ZetaSQL in our streaming application that basically consumes from GCP audit log. As you can imagine, there are lots possibilities that a weird query will occur and flow in the audit log and eventually into our application.
I've encountered an issue specifically on the extractTableNamesFromStatement method of Analyzer which only occurs on specific type of query. Particularly for queries having the same table name for both target and source, like the query below.
CREATE OR REPLACE TABLE project_test.dataset_test.test_table AS SELECT DISTINCT * FROM project_test.dataset_test.test_table
the expected result based on my tests with various types of statements should return [project_test.dataset_test.test_table]
However, in the query above, the analyzer returns [project_test, dataset_test, test_table] instead.
here's how I used it:
LanguageOptions languageOptions = new LanguageOptions();
languageOptions.enableMaximumLanguageFeatures();
languageOptions.setSupportsAllStatementKinds();
AnalyzerOptions analyzerOptions = new AnalyzerOptions();
analyzerOptions.setLanguageOptions(languageOptions);
analyzerOptions.setPruneUnusedColumns(true);
Analyzer.extractTableNamesFromStatement(query, analyzerOptions)
.stream()
.flatMap(List::stream)
.collect(Collectors.toList());
I'm using ZetaSQL in our streaming application that basically consumes from GCP audit log. As you can imagine, there are lots possibilities that a weird query will occur and flow in the audit log and eventually into our application.
I've encountered an issue specifically on the
extractTableNamesFromStatement
method of Analyzer which only occurs on specific type of query. Particularly for queries having the same table name for both target and source, like the query below.CREATE OR REPLACE TABLE project_test.dataset_test.test_table AS SELECT DISTINCT * FROM project_test.dataset_test.test_table
the expected result based on my tests with various types of statements should return
[project_test.dataset_test.test_table]
However, in the query above, the analyzer returns
[project_test, dataset_test, test_table]
instead.here's how I used it: