Open joons5834 opened 2 years ago
https://cloud.google.com/bigquery/docs/partitioned-tables
https://cloud.google.com/bigquery/docs/partitioned-tables#partitioning_versus_clustering
under the following circumstances:
How to migrate a non-partitioned table into a partitioned one
CREATE TABLE
mydataset.newtable (transaction_id INT64, transaction_date DATE)
PARTITION BY
transaction_date
AS SELECT transaction_id, transaction_date FROM mydataset.mytable
Creating a clustered table from a query result
bq query --use_legacy_sql=false \
'CREATE TABLE
mydataset.myclusteredtable
PARTITION BY
DATE(timestamp)
CLUSTER BY
customer_id AS
SELECT
*
FROM
`mydataset.mytable`'
Modifying clustering specification
You can change the clustering specification in the following ways:
Call the tables.update or tables.patch API method.
Call the bq command-line tool's bq update command with the --clustering_fields flag.
month
during load/insert for performance boost.
Since BigQuery does not use index, it requires full table scan for filtering/aggregation etc. Hence, partitioning/clustering table is recommended both cost & performance-wise. See https://cloud.google.com/bigquery/docs/best-practices-costs#use_clustered_or_partitioned_tables , https://cloud.google.com/bigquery/docs/best-practices-costs#partition_data_by_date , https://cloud.google.com/bigquery/docs/best-practices-costs#do_not_use_limit_to_control_costs_in_non-clustered_tables