nus-cs3281 / 2022

http://nus-cs3281.github.io/2022/
2 stars 1 forks source link

Book: Designing Data-Intensive Applications Ch 2: Data Models and Query Languages #20

Open dingyuchen opened 2 years ago

dingyuchen commented 2 years ago

Book: Designing Data-Intensive Applications Chapter 2: Data Models and Query Languages

Summary:

The limits of my language mean the limits of my world

  • Ludwig Wittgenstein, Tractatus Logico-Philosophicus (1922)

Data models plays an integral role in software development. It affects not only how the software is written, but also how we think about the problem that we are solving. Most applications are built by layering one data model on top of another, abstracting complexity and details of the layer below it. This chapter provides on overview of the various different forms of data storage and querying.

Relational Model vs Document Model

The best-known data model today is SQL, based on the relational model. Data is organized into relations, where each relation is an unordered collection of tuples. The use cases are typically transaction processing and batch processing. Other databases at the time force applications developers to think a lot about the internal representation of data in the database, which cemented SQL as the de facto approach to data storage.

Furthermore, as computers become more powerful, they become used for increasingly diverse purposes. Remarkably, relational databases turned out to generalize very well, beyond their original scope of business data processing.

Birth of NoSQL

The name "NoSQL" was originally intended simply as a catchy Twitter hashtag for a meetup on open source, distributed, non-relational databases. A number of interesting database systems are no associated with the hashtag, and is retroactively reinterpreted as Not Only SQL.

The driving forces behind the adoption of NoSQL includes:

It is likely that applications in the future will utilize relational databases alongside a variety of non-relational databases - an idea called polyglot persistence.

Object-relational Mismatch

Most application development today is done in object-oriented programming languages, which leads to a criticism of SQL: if data is stored in relational tables, a transition layer is required between the objects in the application code and the database model of tables, rows ad columns. The disconnect is called impedance mismatch.

We can represent something like resume with a document model, since it is relatively self-contained. There is a tradeoff with duplicated raw data versus using an id to represent for example a country. Normalizing data like this requires many-to-one relationships, which don't fit nicely into the document model. In document databases, joins are not needed for one-to-many tree structures, and support for joins is often weak.

The Network Model

The network model was standardized by a committee called the Conference on Data Systems Languages. In the network model, a record could have multiple parents. The links between records in the network model were not foreign keys, but more like pointers in a programming language. They only way of accessing a record was to follow a path from a root record along these chains of links. This was called an access path.

The downside was that the model made code for querying and updating teh database complicated and inflexible. It was difficult to make changes to an application's data model.