Closed ReubenFrankel closed 1 month ago
Try this alternate way to export the specific columns from the table rather than exporting entire table
EXPORT DATA OPTIONS( uri='gs://bucket/folder/*.csv', format='CSV', overwrite=true, header=true, field_delimiter=';') AS SELECT field1, field2 FROM mydataset.table1 ORDER BY field1 LIMIT 10
Try this alternate way to export the specific columns from the table rather than exporting entire table
EXPORT DATA OPTIONS( uri='gs://bucket/folder/*.csv', format='CSV', overwrite=true, header=true, field_delimiter=';') AS SELECT field1, field2 FROM mydataset.table1 ORDER BY field1 LIMIT 10
Thanks, I was aware that this is possible through query
, but I want to be able to do this with extract_table
where you can already specify export format, compression and more with the job_config
kwarg:
# from google.cloud import bigquery
# client = bigquery.Client()
# bucket_name = 'my-bucket'
destination_uri = "gs://{}/{}".format(bucket_name, "shakespeare.csv.gz")
dataset_ref = bigquery.DatasetReference(project, dataset_id)
table_ref = dataset_ref.table("shakespeare")
job_config = bigquery.job.ExtractJobConfig()
job_config.compression = bigquery.Compression.GZIP
extract_job = client.extract_table(
table_ref,
destination_uri,
# Location must match that of the source table.
location="US",
job_config=job_config,
) # API request
extract_job.result() # Waits for job to complete.
https://cloud.google.com/bigquery/docs/exporting-data#extract_compressed_table
Thanks @peter-s-martin for answering the question! Indeed we can use EXPORT DATA
to extract only specific columns of a table.
@ReubenFrankel You can specify the same configs with EXPORT DATA
: see the export_option_list
section of the documentation of the statement.
I will close the issue for now, but feel free to leave a comment or open a new issue if you have any further questions.
This question was more about whether there is a way to select columns for extract programmatically with extract_table
, rather than provide the SQL to execute an extract job as a string to query
(I was already aware of this as an alternative). In terms of DX, I prefer the interface extract_table
provides and having an ExtractJob
instance returned. Since you both directed me to the SQL approach, I'm gonna assume it's not possible with extract_table
currently.
For posterity, here's what I imagined the API might look like:
extract_job = client.extract_table(
table_ref,
columns=[
"event_name",
"event_timestamp",
],
destination_uri=destination_uri,
location="US",
job_config=job_config,
)
or
job_config = bigquery.job.ExtractJobConfig()
job_config.compression = bigquery.Compression.GZIP
job_config.columns = [
"event_name",
"event_timestamp",
]
extract_job = client.extract_table(
table_ref,
destination_uri,
location="US",
job_config=job_config,
)
Indeed, I don't think there's an option to specify columns directly in the ExtractJob API. It does feel like a useful feature, though. If you are interested in making it a feature, it would be helpful to file a feature request with customer support here: https://cloud.google.com/bigquery/docs/getting-support.
Is it possible to select specific columns for export with
extract_table
somehow? Or do I need to usequery
for that kind of additional functionality?