Open tdcmeehan opened 7 months ago
@tdcmeehan, As I checked for spark-sql syntax, it provides 2 ways to do that -
branch_yourBranch
in query,
-- INSERT (1,' a') (2, 'b') into the audit branch.
INSERT INTO prod.db.table.branch_audit VALUES (1, 'a'), (2, 'b');
-- UPDATE audit branch, UPDATE prod.db.table.branch_audit AS t1 SET val = 'c'
-- DELETE FROM audit branch, DELETE FROM prod.dbl.table.branch_audit WHERE id = 2;
- Setting branch name with `spark.wap.branch` config,
SET spark.wap.branch = audit
INSERT INTO prod.db.table VALUES (3, 'c');
UPDATE prod.db.table AS t1 SET val = 'c';
DELETE FROM prod.db.table WHERE id = 2;
Reference - https://iceberg.apache.org/docs/latest/spark-writes/#writing-to-branches
So from Presto, should we extend the support in Parser for specifying the branch something like below?
INSERT INTO prod.db.table AT BRANCH 'my-branch' VALUES (3, 'c');
UPDATE prod.db.table AS t1 AT BRANCH 'my-branch' SET val = 'c';
DELETE FROM prod.db.table AT BRANCH 'my-branch' WHERE id = 2;
Iceberg supports mutations (upserts, deletes, inserts) on branches. We should allow Presto to specify a branch during mutations so that users can, e.g., perform experiments on data without having to copy the data elsewhere.
Part of #22025
Expected Behavior or Use Case
INSERT INTO iceberg.default.table AT BRANCH 'my-branch' VALUES ...
Presto Component, Service, or Connector
Presto Iceberg connector, and some changes to the parser and SPI to allow a branch to be specified.
Possible Implementation
Example Screenshots (if appropriate):
Context
Hive syntax: https://medium.com/@ayushtkn/apache-hive-4-x-with-iceberg-branches-tags-3d52293ac0bf Spark syntax: https://iceberg.apache.org/docs/latest/branching/#audit-branch