prestodb / presto

The official home of the Presto distributed SQL query engine for big data
http://prestodb.io
Apache License 2.0
15.97k stars 5.35k forks source link

Add support for mutating an Iceberg branch #22030

Open tdcmeehan opened 7 months ago

tdcmeehan commented 7 months ago

Iceberg supports mutations (upserts, deletes, inserts) on branches. We should allow Presto to specify a branch during mutations so that users can, e.g., perform experiments on data without having to copy the data elsewhere.

Part of #22025

Expected Behavior or Use Case

INSERT INTO iceberg.default.table AT BRANCH 'my-branch' VALUES ...

Presto Component, Service, or Connector

Presto Iceberg connector, and some changes to the parser and SPI to allow a branch to be specified.

Possible Implementation

Example Screenshots (if appropriate):

Context

Hive syntax: https://medium.com/@ayushtkn/apache-hive-4-x-with-iceberg-branches-tags-3d52293ac0bf Spark syntax: https://iceberg.apache.org/docs/latest/branching/#audit-branch

agrawalreetika commented 1 week ago

@tdcmeehan, As I checked for spark-sql syntax, it provides 2 ways to do that -

-- UPDATE audit branch, UPDATE prod.db.table.branch_audit AS t1 SET val = 'c'

-- DELETE FROM audit branch, DELETE FROM prod.dbl.table.branch_audit WHERE id = 2;


- Setting branch name with `spark.wap.branch` config,

SET spark.wap.branch = audit

INSERT INTO prod.db.table VALUES (3, 'c');

UPDATE prod.db.table AS t1 SET val = 'c';

DELETE FROM prod.db.table WHERE id = 2;


Reference - https://iceberg.apache.org/docs/latest/spark-writes/#writing-to-branches

So from Presto, should we extend the support in Parser for specifying the branch something like below?

INSERT INTO prod.db.table AT BRANCH 'my-branch' VALUES (3, 'c');

UPDATE prod.db.table AS t1 AT BRANCH 'my-branch' SET val = 'c';

DELETE FROM prod.db.table AT BRANCH 'my-branch' WHERE id = 2;