Add `add_files` procedure in Iceberg connector

trinodb / trino

Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)

https://trino.io

Apache License 2.0

10.1k stars 2.92k forks source link

Add `add_files` procedure in Iceberg connector #11744

Open erikerlandson opened 2 years ago

erikerlandson commented 2 years ago

like so: https://github.com/RussellSpitzer/iceberg/blob/a4279fc5842046043f2afdc90f2428243958574d/spark3-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestAddFilesProcedure.java#L80

osscm commented 2 years ago

erikerlandson commented 2 years ago

One of the use cases I had in mind for this was that the files I want to add are sitting on some s3 bucket. So in this use case,there needs to be a way to supply add_files with s3 credentials, as parameters.

alexjo2144 commented 1 year ago

Similarly to the Hive connector's allow-register-partition-procedure this should be disabled by default and opted in using a catalog property. The idea being that it should only be turned on if file system location based access control is in place.

blopezpi commented 1 year ago

Any updates on this? It will be great to have this procedure for importing a bunch of data directly avoiding any insert command.

anandsakhare commented 10 months ago

ebyhr commented 1 month ago

@martint Could you review the syntax of this procedure? The procedure name in Spark is add_files. The detailed information is documented at https://iceberg.apache.org/docs/latest/spark-procedures/#add_files

The arguments should follow Trino conventions (e.g. source_table should not be abused for locations), but the name looks good to me.