google / zetasql

ZetaSQL - Analyzer Framework for SQL
Apache License 2.0
2.32k stars 219 forks source link

ExtractTableNamesFromScript handling DDL statements unexpectedly #102

Open justinwhite opened 2 years ago

justinwhite commented 2 years ago

Given a script

BEGIN

CREATE TABLE `project1.dataset1.table1` AS
SELECT *
FROM `project2.dataset2.table2`;

CREATE TABLE `project3.dataset3.table3` (
  foo STRING,
  bar STRING
);

ALTER TABLE `project4.dataset4.table4`
ADD COLUMN IF NOT EXISTS foo STRING;

END;

I would expect ExtractTableNamesFromScript to parse out 4 tables:

but with the following options set

zetasql::LanguageOptions ConfigureLanguage() {
  zetasql::LanguageOptions language_options =
    zetasql::LanguageOptions::MaximumFeatures();
  language_options.SetSupportsAllStatementKinds();
  language_options.EnableLanguageFeature(zetasql::FEATURE_V_1_3_SCRIPT_LABEL);
  return language_options;
}

zetasql::AnalyzerOptions ConfigureOptions() {
  zetasql::LanguageOptions language_options = ConfigureLanguage();
  zetasql::AnalyzerOptions analyzer_options(language_options);
  return analyzer_options;
}

i.e. called with

zetasql::TableNamesSet table_names;
zetasql::AnalyzerOptions analyzer_options = ConfigureOptions();
absl::Status status =
    zetasql::ExtractTableNamesFromScript(sproc, analyzer_options, &table_names);

ExtractTableNamesFromScript only identifies project2.dataset2.table2.

Is there a language option I'm missing? Or does this method handle top level DDL statements differently than any supporting nodes?

justinwhite commented 2 years ago

I think this is the relevant piece of code: https://github.com/google/zetasql/blob/42cab5f4f7c3ed3ad823ac3929a83fbcd36948ae/zetasql/public/table_name_resolver.cc#L298-L333

matthewcbrown commented 2 years ago

That function and its variants only return tables that are read from. This isn't documented well.

On Tue, Apr 12, 2022 at 9:42 AM Justin @.***> wrote:

I think this is the relevant piece of code: https://github.com/google/zetasql/blob/42cab5f4f7c3ed3ad823ac3929a83fbcd36948ae/zetasql/public/table_name_resolver.cc#L298-L333

— Reply to this email directly, view it on GitHub https://github.com/google/zetasql/issues/102#issuecomment-1096955743, or unsubscribe https://github.com/notifications/unsubscribe-auth/AL3D5MRB2NKEVBPWQUBDM2LVEWRV3ANCNFSM5THXPHZA . You are receiving this because you are subscribed to this thread.Message ID: @.***>

justinwhite commented 2 years ago

Thanks for clarifying. I'm guessing if we want to extract all tables, we [instead] have to do something like

void GetIdentifierNodes(const zetasql::ASTNode* root, set<string> &names) {
  const set<int> identifier_kinds = {zetasql::AST_IDENTIFIER};
  vector<const zetasql::ASTNode*> identifier_nodes;
  root->GetDescendantSubtreesWithKinds(identifier_kinds, &identifier_nodes);
  for (auto node : identifier_nodes) {
    const zetasql::ASTIdentifier* id_ = node->GetAsOrDie<zetasql::ASTIdentifier>();
    names.insert(id_->GetAsString());
  }
}

where root is the top level statement node.