Closed sentenz closed 1 year ago
OLTP (Online Transaction Processing)
We can use PostgreSQL for CRUD (Create-Read-Update-Delete) operations.
OLAP (Online Analytical Processing)
We can use PostgreSQL for analytical processing. PostgreSQL is based on HTAP (Hybrid transactional/analytical processing) architecture, so it can handle both OLTP and OLAP well.
FDW (Foreign Data Wrapper)
A FDW is an extension available in PostgreSQL that allows us to access a table or schema in one database from another.
Streaming
PipelineDB is a PostgreSQL extension for high-performance time-series aggregation, designed to power real-time reporting and analytics applications.
Geospatial
PostGIS is a spatial database extender for PostgreSQL object-relational database. It adds support for geographic objects, allowing location queries to be run in SQL.
Time Series
Timescale extends PostgreSQL for time series and analytics. For example, developers can combine relentless streams of financial and tick data with other business data to build new apps and uncover unique insights.
Distributed Tables
CitusData scales Postgres by distributing data & queries.
:tada: This issue has been resolved in version 1.14.0 :tada:
The release is available on:
v1.14.0
Your semantic-release bot :package::rocket:
Database
A database is a collection of structured data that is organized and stored in a way that enables efficient retrieval and manipulation of the data. A database can be used to store and manage data for a variety of purposes, such as maintaining a record of transactions, storing customer information, or tracking inventory.
Databases are typically managed by a database management system (DBMS), which provides a set of tools and interfaces for creating, modifying, and querying the data stored in the database. This can include a graphical user interface for managing the database, or a set of command-line tools and programming APIs for accessing and manipulating the data programmatically.
1. Category
1.1. Relational Databases
Relational databases store data in a structured format using tables and relationships. The data is organized into rows and columns, and relationships are established between tables using keys.
DBMS tools: MySQL, Oracle Database, Microsoft SQL Server.
1.2. Object-Relational Databases
Object-Relational databases combine the features of relational databases and object-oriented programming. They store data in tables as well as objects, allowing for more complex data structures and relationships.
DBMS tools: PostgreSQL, Oracle Database.
1.3. NoSQL Databases
NoSQL databases do not use a fixed schema, and instead store data in a variety of formats, such as key-value pairs, documents, or graph structures. They are designed for scalability and handling large amounts of unstructured data.
DBMS tools: MongoDB, Cassandra, Amazon DynamoDB.
1.4. Columnar Databases
Columnar databases store data in columns rather than rows, providing faster query performance for large datasets, improving data performance and scalability. They are typically used in data warehousing and analytics.
DBMS tools: Apache Cassandra, Amazon Redshift, Google Bigtable.
1.5. Graph Databases
Graph databases store data as nodes and relationships, providing efficient querying and navigation of complex relationships. They are commonly used in social network analysis, recommendation systems, and fraud detection.
DBMS tools: Neo4j, Amazon Neptune, OrientDB.
1.6. Time-series Databases
Time-series databases are specialized databases optimized for storing and retrieving time-stamped data. They are commonly used for monitoring and analyzing sensor data, financial data, and IoT data.
DBMS tools: InfluxDB, TimescaleDB, OpenTSDB.
1.7. In-Memory Databases
In-Memory databases store data in RAM, providing extremely fast query performance. They are used for real-time applications, such as gaming, financial trading, and telecommunications.
DBMS tools: Redis, Apache Ignite, MemSQL.
1.8. Cloud Databases
Cloud databases are databases that run on cloud computing platforms, such as Amazon Web Services, Microsoft Azure, or Google Cloud Platform. They provide scalable, on-demand access to databases, allowing for easy provisioning and management of resources.
DBMS tools: Amazon RDS, Microsoft Azure SQL Database, Google Cloud SQL.
1.9. Hybrid Databases
Hybrid databases are databases that combine the features of multiple database categories, allowing for a variety of data formats and structures. They are designed to handle the demands of modern applications, which often require a combination of structured and unstructured data.
DBMS tools: Amazon DocumentDB, Microsoft Azure Cosmos DB, Google Cloud Firestore.
1.10. Distributed Databases
Distributed databases are databases that are split across multiple servers, providing increased scalability and fault tolerance. They are commonly used for large-scale, data-intensive applications, such as e-commerce, online gaming, and scientific simulations.
DBMS tools: Apache Cassandra, Amazon DynamoDB, Microsoft Azure Cosmos DB.
1.11. Embedded Databases
Embedded databases are databases that run directly on an application, rather than on a separate server. They are typically used for small-scale, standalone applications, such as mobile devices or embedded systems.
DBMS tools: SQLite, Berkeley DB, H2.
1.12. XML Databases
XML databases store data in the XML format, providing a flexible and extensible data structure. They are commonly used for data exchange and web services, as well as for document management.
DBMS tools: eXist-db, BaseX, Apache MarkLogic.
1.13. Key-Value Databases
Key-value databases are databases that store data as key-value pairs, allowing for simple and efficient data storage and retrieval. They often support advanced key-value operations, such as key-value updates, key-value deletions, and key-value search, and they are often used in applications that require simple and efficient data storage and retrieval.
DBMS tools: Amazon DynamoDB, Riak, Redis.
1.14. Document Databases
Document databases are databases that store data as semi-structured or unstructured documents, allowing for the flexible storage and retrieval of data elements. They often support advanced document operations, such as document queries, document updates, and document indexing, and they are often used in applications that require the flexible storage and retrieval of semi-structured or unstructured data elements.
DBMS tools: MongoDB, Amazon DocumentDB, Couchbase.
1.15. Geospatial Databases
Geospatial databases are databases that are optimized for storing and retrieving geospatial data, such as locations, maps, and geographic information. They often support advanced geospatial operations, such as spatial indexing, geospatial queries, and geospatial analytics, and they are often used in applications that require geospatial data processing and analysis.
DBMS tools: PostGIS, Oracle Spatial and Graph, MongoDB Geospatial.
1.16. Edge Databases
Edge databases are databases that run on edge devices, such as IoT devices, and are designed for low-latency data processing and storage. They often support advanced edge operations, such as local data processing, data aggregation, and data synchronization, and they are often used in applications that require low-latency data processing and storage at the edge.
DBMS tools: Amazon Greengrass, Microsoft Azure IoT Edge, Google Cloud IoT Edge.
1.17. Streaming Databases
Streaming databases are databases that are designed for real-time data ingestion, processing, and analysis of large amounts of data. They often support advanced streaming operations, such as real-time data analysis, event processing, and data pipelines, and they are often used in applications that require real-time data processing and analysis.
DBMS tools: Apache Kafka, Apache Flink, Amazon Kinesis.
1.18. Machine Learning Databases
Machine learning databases are databases that provide built-in machine learning algorithms and support for data science workflows. They often support advanced machine learning operations, such as data preprocessing, model training, and model deployment, and they are often used in applications that require machine learning and data science capabilities.
DBMS tools: Google BigQuery ML, Amazon SageMaker, Databricks.
1.19. Event-Driven Databases
Event-driven databases are databases that support event-driven architectures and real-time data processing. They often support advanced event-processing operations, such as event streams, event sourcing, and event notifications, and they are often used in applications that require real-time data processing and event-driven architectures.
DBMS tools: Apache Kafka, Apache Pulsar, Apache Flink.
1.20. NewSQL Databases
NewSQL databases are databases that combine the scalability and performance of NoSQL databases with the transactional consistency and durability of relational databases. They often support advanced SQL operations, and they are often used in applications that require high performance and strong consistency guarantees.
DBMS tools: Google Spanner, CockroachDB, NuoDB.
1.21. Multi-Model Databases
Multi-model databases are databases that support multiple data models, such as relational, document-oriented, graph, and key-value. This allows for greater flexibility in data modeling and the ability to handle a variety of data structures in a single database. Multi-model databases are often used in applications that require the ability to store and retrieve different types of data in a unified way.
DBMS tools: ArangoDB, Amazon DocumentDB, OrientDB.
2. Management
Database Management involves organizing, storing, and maintaining data within a database system. It encompasses design principles, technologies, and methodologies to ensure data reliability, efficiency, and security. The tasks involved in database management include defining data elements, creating and maintaining the database schema, enforcing data integrity and security, optimizing performance, and implementing backup and recovery mechanisms.
2.1. DBMS
DBMS (Database Management System) is a software system that provides an interface for creating, modifying, and managing databases. A DBMS is responsible for managing all aspects of a database, including the storage, retrieval, and manipulation of data.
DBMS is an essential tool for managing and working with large amounts of data. There are a wide variety of DBMS products available, ranging from open-source systems to commercial software packages, each with its own strengths and weaknesses.
Key functions of a DBMS include:
Data Storage
Data Retrieval
Data Manipulation
Data Security
Data Consistency
2.2. Transaction
A transaction in database systems refers to a unit of work that is executed within a database management system (DBMS). The primary objective of transactions is to maintain the consistency and integrity of data within a database, by ensuring that the database remains in a consistent state even in the face of failures or errors. Transactions achieve this objective by implementing the Atomicity, Consistency, Isolation, and Durability (ACID) properties, which guarantee that either all the operations within a transaction are completed successfully, or none of them are completed at all.
2.3. ACID
ACID stands for Atomicity, Consistency, Isolation, and Durability. It is a set of properties that ensure the consistency and reliability of database transactions.
ACID properties ensure that database transactions are reliable, consistent, and resistant to failures or data loss.
Atomicity
Consistency
Isolation
Durability
2.4. CRUD
CRUD stands for Create, Read, Update, and Delete. It is a set of basic operations that can be performed on a database, and is used to manage data in a database.
The CRUD operations are the foundation of most database management systems, and are used to perform the basic functions of managing data in a database.
Create
Read
Update
Delete
2.5. HTAP
HTAP (Hybrid Transactional/Analytical Processing) is a term used to describe a database architecture that combines the characteristics of both transactional (OLTP) and analytical (OLAP) processing. HTAP databases are designed to handle both transactional and analytical workloads in real-time, providing a single source of truth for data that is both transactional and analytical.
2.6. OLTP
OLTP (Online Transactional Processing) is a database design approach that focuses on optimizing transactional processing, such as inserting, updating, or deleting data. OLTP databases are designed to handle large numbers of concurrent transactions, and are optimized for high-speed and low latency.
2.7. OLAP
OLAP (Online Analytical Processing) is a database design approach that focuses on optimizing analytical processing, such as complex queries and reporting. OLAP databases are designed to handle large amounts of data and complex calculations, and are optimized for high-performance data analysis.
2.8. FDW
FDW (Foreign Data Wrapper) is a foreign data wrapper is a database component that enables data stored in external databases to be treated as if it were part of the local database. FDWs provide a convenient way to access data stored in different databases, and can be used to create a single virtual database from multiple sources.
2.9. Streaming
Streaming refers to the process of continuously receiving and processing data in real-time. In database systems, streaming can be used to handle large amounts of data that are generated in real-time, such as IoT data or social media data.
2.10. Geospatial
Geospatial databases are databases that are designed to store and manage geographic data, such as location-based data, maps, and satellite imagery. Geospatial databases are optimized for the storage and management of large amounts of data with geographic attributes, and are widely used in industries such as transportation, logistics, and real estate.
2.11. Time Series
Time series databases are databases that are designed to store and manage time-based data, such as sensor data, stock prices, and weather data. Time series databases are optimized for the storage and management of large amounts of time-based data, and are widely used in industries such as finance, IoT, and weather forecasting.
2.12. Distributed Tables
Distributed tables are tables in a database that are spread across multiple nodes in a distributed database system. Distributed tables provide a way to scale databases horizontally, allowing for the storage and management of large amounts of data across multiple nodes. This provides increased performance and scalability, as well as improved fault tolerance.
2.13. Schema Design
Schema design is a critical component of database architecture and design, as it defines the structure of the data in a database. Schema design involves creating tables, columns, and relationships to represent the data requirements of an application. The schema defines the structure of the data and the relationships between the data elements, and it is used to ensure that the data is stored in a consistent and organized manner.
3. Principle
Atomicity
Consistency
Isolation
Durability
Normalization
Referential Integrity
Data Abstraction
Scalability
Performance
Security
Backup and Recovery
Data Encryption
ata Access Control
Data Modeling
Data Integration
Data Migration
Query Optimization
Indexing
Partitioning
Replication
Redundancy
Fault Tolerance
High Availability
Load Balancing
Data Warehousing
Data Marts
Data Mining
Business Intelligence
Data Visualization
Big Data
NoSQL
4. Best Practice
Define Clear Objectives
Normalize Data
Choose the Right Database Technology
Design for Performance
Implement Security Measures
Monitor and Tune the Database
Plan for Data Backup and Recovery
Document the Database
Consider Scalability
Continuously Evaluate and Improve
Implement Access Controls
Use Unique Identifiers
Use Transactions
Regularly Test and Verify Data
Maintain Data History
Plan for Disaster Recovery
Use Automated Tools
Implement Data Validation
Use Standard Data Formats
Monitor Database Health
Encrypt Sensitive Data
Document Data Sources
Use Triggers
Implement Version Control
Plan for Data Archiving
Use Stored Procedures
Implement Data Partitioning
Use Data Warehousing
Implement Automated Reporting
Evaluate the Database Regularly
Use Materialized Views
Implement Data Replication
Use Real-time Analytics
Consider Multi-tenancy
Implement Disaster Tolerance
Use Compression
Implement Data Sharding
Use Cloud Databases
Implement Auditing
Optimize Queries
5. Terminology
Relational Database
Table
Column
Row
Primary Key
Foreign Key
Index
Query
Transaction
Stored Procedure
Trigger
View
Materialized View
Partition
Data Warehousing
NoSQL
Object-Relational Database
Cloud Database
Scalability
Availability
Replication
Sharding
Backup
Recovery
Data Integrity
Data Security
ACID Properties
Denormalization
Query Optimization
Normalization
Data Modeling
ER Model
Data Types
Concurrency Control
Locking
Transactions Log
Encryption
Data Migration
Performance Tuning
Data Dictionary
Indexing
Foreign Key
Primary Key
Normalization Forms
Stored Procedures
Triggers
Views
Query Language
Data Warehousing