trinodb / trino

Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
https://trino.io
Apache License 2.0
10.49k stars 3.02k forks source link

Add support for fetching Redshift query results using Redshift unload command #24117

Open mayankvadariya opened 1 week ago

mayankvadariya commented 1 week ago

Description

Redshift supports unloading select query results to S3 using UNLOAD command. PR aims to use unload command to unload Redshift query results on S3 and to read generated redshift query results files in parallel.

Official documentation on Redshift unload command: https://docs.aws.amazon.com/redshift/latest/dg/r_UNLOAD.html https://docs.aws.amazon.com/redshift/latest/dg/r_UNLOAD_command_examples.html

Connector internally converts Trino SQL to Redshift's unload SQL and runs it. Resultant Parquet files containing query results will be read in parallel to generate Trino query result. Only Parquet file format is supported in Redshift unload command. Additionally, connector allows configuring certain unload command options.

Below are certain limitations for which Trino internally force fallback to traditional JDBC even if connector is configured to use unload.

Additional context and related issues

Release notes

( ) This is not user-visible or is docs only, and no release notes are required. ( ) Release notes are required. Please propose a release note for me. (x) Release notes are required, with the following suggested text:

## Section
* Add support for fetching Redshift query results using Redshift unload command. ({issue}`issuenumber`)