joons5834 / weather-forecast-accuracy

0 stars 0 forks source link

data schema design #4

Open joons5834 opened 2 years ago

joons5834 commented 2 years ago

Problem to solve: make DW and its pipelines for multi-dimensional analysis in BI tools.

Options researched:

Issues to consider:

joons5834 commented 2 years ago

Combination of star schema and nested data type. Nested data for one-to-many relationships to denormalize data and keep relationship info.

Star schema is for handling the issue of slowly changing dimensions(SCD):

joons5834 commented 2 years ago

https://cloud.google.com/bigquery/docs/best-practices-performance-input#denormalizing_data

Best practice: Denormalization is a common strategy for increasing read performance for relational datasets that were previously normalized. The recommended way to denormalize data in BigQuery is to use nested and repeated fields. It's best to use this strategy when the relationships are hierarchical and frequently queried together, such as in parent-child relationships. [...] Avoid denormalization in these use cases:

  • You have a star schema with frequently changing dimensions.
  • BigQuery complements an Online Transaction Processing (OLTP) system with row-level mutation but can't replace it.
joons5834 commented 2 years ago

https://cloud.google.com/bigquery/docs/best-practices-costs#materialize_query_results_in_stages

Materialize query results in stages Best practice: If possible, materialize your query results in stages.

If you create a large, multi-stage query, each time you run it, BigQuery reads all the data that is required by the query. You are billed for all the data that is read each time the query is run.

Instead, break your query into stages where each stage materializes the query results by writing them to a destination table. Querying the smaller destination table reduces the amount of data that is read and lowers costs. The cost of storing the materialized results is much less than the cost of processing large amounts of data.

https://cloud.google.com/bigquery/docs/materialized-views-intro

Focus less on creating or keeping intermediate table up-to-date