shuijian-xu / hive

0 stars 0 forks source link

WHAT IS A DATA WAREHOUSE? #13

Open shuijian-xu opened 4 years ago

shuijian-xu commented 4 years ago

The accepted definition of a data warehouse (attributed to Bill Inmon, 1992) is a database that contains the following four characteristics:

  1. Subject oriented

  2. Nonvolatile

  3. Integrated

  4. Time variant

shuijian-xu commented 4 years ago

Subject oriented means that the data is organized around subjects (such as Sales) rather than operational applications (such as order processing). Operational databases are organized around business application; they are application oriented.

shuijian-xu commented 4 years ago

Nonvolatile means that the data, once placed in the warehouse, is not usually subject to change. Anyone who is using the database has confidence that a query will always produce the same result no matter how often it is run. Operational databases are extremely volatile in that they are constantly changing. A query is unlikely to produce the same result twice if it is accessing tables which are frequently updated.

shuijian-xu commented 4 years ago

Nonvolatile means that the data, once placed in the warehouse, is not usually subject to change. Anyone who is using the database has confidence that a query will always produce the same result no matter how often it is run. Operational databases are extremely volatile in that they are constantly changing. A query is unlikely to produce the same result twice if it is accessing tables which are frequently updated.

shuijian-xu commented 4 years ago

Integrated means the data is consistent. For instance, dates are always stored in the same format.

shuijian-xu commented 4 years ago

Time variant means that historical data is recorded. Almost all queries executed against a data warehouse have some element of time associated within them. We have already established that most operational systems do not retain historical information. It is almost impossible to predict what will happen in the future without observing what happened in the past. A data warehouse helps to address this fundamental issue by adding a historical dimension to the data taken from the operational databases.

shuijian-xu commented 4 years ago

information systems that help the business in pursuit of its goals. As the goals evolve, the information systems evolve as well. Now that's a data warehouse!

shuijian-xu commented 4 years ago

The popular introduction of database management systems in the 1970s heralded the so-called Copernican revolution in the perception of the value of data. There was a shift in emphasis away from application centric toward a more data centric approach to developing systems. The main objectives of database management systems are the improvement of:

  1. Evolvability. The ability of the database to adapt to the changing needs of the organization and the user community. This includes the ability of the database to grow in scale, in terms of data volumes, applications, and users.

  2. Availability. Ensuring that the data has structure, and is able to be viewed in different ways by different applications and users with specific and nonspecific requirements.

  3. Sharability. Recognition of the fact that the data belongs to the whole organization and not to single users or groups of users.

  4. Integrity. Improving the quality, maintaining existence, and ensuring privacy.

shuijian-xu commented 4 years ago

The objectives of evolvability, availability, sharability, and integrity are still entirely valid and even more so with decision support systems where the following exists:

  1. The nature of the user access tends to be unstructured.

  2. The information is required for a wide number of users with differing needs.

  3. The system must be able to respond to changes in business direction in a timely fashion.

  4. The results of queries must be reliable and consistent.

The development of an application-centric solution will not support those objectives. The only way to be sure that the database possesses these qualities is to design them in at the beginning.

shuijian-xu commented 4 years ago

So that's what we get if we achieve the goal. However, we need to assess each of the business strategies that we were planning to adopt to see how the savings break down. Let's assume that the reduction of churn is to be achieved by three major new initiatives:

  1. Loyalty bonuses. For customers who have been active for more than one year, we want to be able to reward them, on their birthday and wedding anniversary, where appropriate, with a bottle or two of their favorite wine.

  2. Personalized campaigns. Once we have collected some information about customers' behavior, we want to be able to target them in campaigns with goods that we know will interest them.

  3. Predictive modeling. We want to be able to determine which of our customers are susceptible to churning so that we can take some proactive steps to try to ensure that it does not happen.

shuijian-xu commented 4 years ago

There are other goals, and we have to go through the same exercise with each of them. For instance, another goal of the Wine Club is to increase the customer base by 5 percent per annum (5,000 additional customers). The emphasis being on attracting the “right” type of customer.

The strategies for this are to be as follows:

  1. Customer profiling to try to ascertain the right types of customers to approach

  2. Campaign management so that we can contact the right people with the right offer