shuijian-xu / hive

0 stars 0 forks source link

Components of first generation data warehouse. #72

Open shuijian-xu opened 4 years ago

shuijian-xu commented 4 years ago
  1. Extraction of the source data from a variety of application databases. These source applications are often using very different technology.

  2. Integration of the data. There are two types of integration. First there is format integration, where logically similar data types (e.g., dates) are converted so that they have the same physical data type. Second, semantic integration so that the meaning of the information is consistent.

  3. The database itself. The data warehouse database can become enormous as a new layer of fact data is added each day. The star schema is implemented as a series of tables. The fact table (the center of the star ) is long and thin in that it usually has a large number of rows and a small number of columns. The fact columns must be summable. The dimension tables (the points of the star) are joined to the fact table through foreign keys. Where a dimension participates in a hierarchy, the model is sometimes referred to as a snowflake.

  4. Aggregate navigation is a technique which enables the users to have their queries automatically directed at aggregate tables without them being aware that it is happening. This is very important for query performance.

  5. Presentation of information. This is how the information is presented to the users of the data warehouse. Most implementations opt for a client-server approach, which gives them the capability to view their information in a variety of tabular or graphical formats.