joons5834 / weather-forecast-accuracy

0 stars 0 forks source link

partitioned/clustered table #25

Open joons5834 opened 2 years ago

joons5834 commented 2 years ago

Since BigQuery does not use index, it requires full table scan for filtering/aggregation etc. Hence, partitioning/clustering table is recommended both cost & performance-wise. See https://cloud.google.com/bigquery/docs/best-practices-costs#use_clustered_or_partitioned_tables , https://cloud.google.com/bigquery/docs/best-practices-costs#partition_data_by_date , https://cloud.google.com/bigquery/docs/best-practices-costs#do_not_use_limit_to_control_costs_in_non-clustered_tables

joons5834 commented 2 years ago

Choosing partitioning details

https://cloud.google.com/bigquery/docs/partitioned-tables

daily/hourly/monthly?

https://cloud.google.com/bigquery/docs/partitioned-tables#choosing_daily_hourly_monthly_or_yearly_partitioning

Daily

hourly

monthly/yearly

early conclusion

joons5834 commented 2 years ago

Partitioning versus clustering

https://cloud.google.com/bigquery/docs/partitioned-tables#partitioning_versus_clustering

clustering

partitioning

Prefer clustering over partitioning

under the following circumstances:

joons5834 commented 2 years ago

In case things can go wrong and the decision has to change

How to migrate a non-partitioned table into a partitioned one

CREATE TABLE
  mydataset.newtable (transaction_id INT64, transaction_date DATE)
PARTITION BY
  transaction_date
AS SELECT transaction_id, transaction_date FROM mydataset.mytable

Creating a clustered table from a query result

bq query --use_legacy_sql=false \
'CREATE TABLE
   mydataset.myclusteredtable
 PARTITION BY
   DATE(timestamp)
 CLUSTER BY
   customer_id AS
 SELECT
   *
 FROM
   `mydataset.mytable`'

Modifying clustering specification

You can change the clustering specification in the following ways:

  • Call the tables.update or tables.patch API method.

  • Call the bq command-line tool's bq update command with the --clustering_fields flag.

joons5834 commented 2 years ago

Possible solutions