Open christophstockhusen opened 5 years ago
This was improved a bit in c610a21ffdc110293c1c7bd255a2674ebc7ec7a8 - you can call Analyzer.BuildExpression, but there still isn't a great way to actually modify a ResolvedExpr (if that is your goal). Can you talk about your usecase?
We are maintaining a data warehouse of dozens of BigQuery datasets with sensitive personal data of our customers. By GDPR, many of collegues do not have the permission to actually see some sensitive personal parts of the raw data (this includes parts like customer numbers and even pseudonyms), but they have the permission to analyze and work with this sensitive data to some extend. Therefore, we build our collegues a web interface (GAE ftw.) to submit their queries, which does essentially two things:
customer_data
that contains a sensitive column name
and you submit the query SELECT CONCAT(name, "foo") AS funny_name FROM customer_data
, then funny_name
should be classified as sensitive. SELECT shop_id, AVG(customer_age) FROM customer_data GROUP BY shop_id
the new query
SELECT shop_id, AVG(customer_age), COUNT(DISTINCT customer_age) AS hidden_count FROM customer_data GROUP BY shop_id
and show a resulting column only if hidden_count
is larger than 20. (Again, this is extremely simplified, because this is obviously by far not enough to ensure data protection.)
For both of these tasks we are currently using Apache Calcite, but the ZetaSQL dialect contains some parts (e.g. the non-standard EXCEPT
in SELECT * EXCEPT(foo)
) that are not supported by Calcite and are really hard or even impossible to implement using Calcite. Thus, we were very happy to see that the ZetaSQL analyzer was published here. While we nearly managed to fully implement the lineage computation using ZetaSQL (despite the missing docs), we are not able to modify the resolved AST and unparse it to an SQL statement that we can submit to BigQuery, because SQLBuilder.java
and its surrounding classes are not provided.
Forgot to update this. ResolveAST Nodes now have associated builder classes. Documentation is here: ResolvedNode.java
It's still a clunky for doing whole tree transforms. Something like the DeepCopyVisitor in c++ would probably be needed.
@christophstockhusen, this looks very similar to an effort we are working on. Have you have any success in this regards, yet?
@cvonredapt I took over working on that issue from @christophstockhusen and came up with #29 to solve this issue.
Actually, the gRPC interfaces are already in place, you just have to extend the Analyzer
interface to use it.
Edit: I just noticed, #15 does the same thing.
Latest release contains a rewriting visitor which can help with the read-modify-write use case. See examples in RewritingVisitorTest.java (sorry, not really any useful documentation yet)
Currently, the Java classes provide the possibility to analyze a SQL statement and get a resolved AST. However, the it is not possible to create a SQL statement from the resolved AST, because the required Java classes are missing (even though the corresponding C++ classes are available), i.e. SqlBuilder.java and its dependencies.