prestodb / presto

The official home of the Presto distributed SQL query engine for big data
http://prestodb.io
Apache License 2.0
15.92k stars 5.32k forks source link

UNLOAD to write to a destination ( like S3 ) the result of a query #23561

Open raphaelauv opened 2 weeks ago

raphaelauv commented 2 weeks ago

Expected Behavior or Use Case

Like what is possible with aws athena - https://docs.aws.amazon.com/redshift/latest/dg/r_UNLOAD.html

write the result of a query to a destination like S3 in PARQUET / CSV / JSONL ...

Context

Would be great to write directly the result of a query from prestodb

tdcmeehan commented 2 weeks ago

@raphaelauv is the idea that this is useful for instances where you want to ensure that only a single file is created? As opposed to CTAS, where multiple files may be created, and you may not want to register this file with a metastore?

raphaelauv commented 2 weeks ago

UNLOAD is for exporting data "out of prestodb", not creating a new table

tdcmeehan commented 2 weeks ago

@raphaelauv understood. But I suppose the question is around what do you define to be out of PrestoDB? If you create a connector that wrote to S3, like a Hive or Iceberg connector configured in this way, and did a CTAS inserting into a table in this connector, this would create Parquet files in S3, i.e. out of PrestoDB. Table is just an abstraction over this concept, but you're free to go to the files directly. So the question is, what would UNLOAD do for you that CTAS into an unpartitioned table doesn't already do?

raphaelauv commented 2 weeks ago

Out of prestodb for any application code that is not capable ( or forbidden for any technical or organisational reason ) of connecting to prestodb