Add support for mutating an Iceberg branch

prestodb / presto

The official home of the Presto distributed SQL query engine for big data

Apache License 2.0

15.97k stars 5.35k forks source link

Iceberg supports mutations (upserts, deletes, inserts) on branches. We should allow Presto to specify a branch during mutations so that users can, e.g., perform experiments on data without having to copy the data elsewhere.

Part of #22025

Expected Behavior or Use Case

INSERT INTO iceberg.default.table AT BRANCH 'my-branch' VALUES ...

Presto Component, Service, or Connector

Presto Iceberg connector, and some changes to the parser and SPI to allow a branch to be specified.

Possible Implementation

Example Screenshots (if appropriate):

Context

Hive syntax: https://medium.com/@ayushtkn/apache-hive-4-x-with-iceberg-branches-tags-3d52293ac0bf Spark syntax: https://iceberg.apache.org/docs/latest/branching/#audit-branch

@tdcmeehan, As I checked for spark-sql syntax, it provides 2 ways to do that -

Using branch identifier, branch_yourBranch in query,


-- INSERT (1,' a') (2, 'b') into the audit branch.
INSERT INTO prod.db.table.branch_audit VALUES (1, 'a'), (2, 'b');

-- UPDATE audit branch, UPDATE prod.db.table.branch_audit AS t1 SET val = 'c'

-- DELETE FROM audit branch, DELETE FROM prod.dbl.table.branch_audit WHERE id = 2;


- Setting branch name with `spark.wap.branch` config,

SET spark.wap.branch = audit

INSERT INTO prod.db.table VALUES (3, 'c');

UPDATE prod.db.table AS t1 SET val = 'c';

DELETE FROM prod.db.table WHERE id = 2;


Reference - https://iceberg.apache.org/docs/latest/spark-writes/#writing-to-branches

So from Presto, should we extend the support in Parser for specifying the branch something like below?

INSERT INTO prod.db.table AT BRANCH 'my-branch' VALUES (3, 'c');

UPDATE prod.db.table AS t1 AT BRANCH 'my-branch' SET val = 'c';

DELETE FROM prod.db.table AT BRANCH 'my-branch' WHERE id = 2;

prestodb / presto