During sync, current logic uses clojure's functional programming JDBC wrapper library to call the native Trino JDBC driver. This clojure wrapper lib forces us to use prepared statements and there is no way to disable that from this lib. The issue with this is:
each sync query results in 2 service calls to Trino resulting in double the amount of network calls, see example:
PREPARE statement5 FROM SHOW TABLES FROM "tpch"."sf10000"
---
EXECUTE statement5
they are useless prepared statments since they actually contain no values to add parameters for, they can't even be reused.
Solution
Remove prepared statements by:
leveraging the native Trino JDBC driver and use JDBC statements, stop executing queries with the Clojure wrapper lib.
I tried to preserve all the major reduce logic. I manually tested the sync locally, and all sync logic is currently tested via the existing unit tests. If sync broke, then most tests would fail.
Manual Tests
I ran sync on TPCH catalog, all schemas, all tables on insights cluster. Below is the query count of the sync for the three drivers:
Current Prod Starburst Metabase driver (with issues) = 258 queries
Old presto Metabase driver, the driver Meesho currently uses in their prod = 155 queries
Starburst Metabase driver with these changes = 93 queries
Overview
During sync, current logic uses clojure's functional programming JDBC wrapper library to call the native Trino JDBC driver. This clojure wrapper lib forces us to use prepared statements and there is no way to disable that from this lib. The issue with this is:
Solution
Remove prepared statements by:
statements
, stop executing queries with the Clojure wrapper lib.reducible-result-set
andresult-set-seq,
see: https://clojure.github.io/java.jdbc/index.html#clojure.java.jdbc/reducible-result-setTests
I tried to preserve all the major reduce logic. I manually tested the sync locally, and all sync logic is currently tested via the existing unit tests. If sync broke, then most tests would fail.
Manual Tests
I ran sync on TPCH catalog, all schemas, all tables on insights cluster. Below is the query count of the sync for the three drivers: