secns / share

This repository is to share knowledge about AI, Database, etc
0 stars 0 forks source link

Data Lakehouse #3

Open secns opened 1 week ago

secns commented 1 week ago

A Data Lakehouse is a data management architecture that builds upon the concept of a data lake, integrating the flexibility of data lakes with the governance capabilities of data warehouses. Within a Data Lakehouse, the following types of data are typically stored:

  1. Structured Data: Tabular data originating from relational databases, such as transaction records, customer information, which adheres to strict schema.

  2. Semi-Structured Data: Including log files, CSVs, XMLs, JSONs, etc., where data has some structure but is less rigidly defined compared to structured data.

  3. Unstructured Data: Comprising text documents, images, audio, video, and other forms of data without predefined structures, making them more complex to process.

  4. Raw Data: Unprocessed primary input data collected directly from various source systems, maintaining its original form.

  5. Derived Data: New data generated through cleaning, transforming, or aggregating raw data, tailored for specific analytics or reporting purposes.

  6. Metadata: Information about the data itself, including its origin, format, meaning, storage location, and data quality, which is vital for data management and governance.

  7. Real-time/Streaming Data: Continuous data flows from sources like sensors, web clickstreams, social media, requiring real-time capture and processing.

The essence of the Data Lakehouse design is to provide a unified platform enabling enterprises to store all types of data in a centralized location while implementing data governance strategies to ensure data quality, security, and compliance. By applying advanced analytics tools and machine learning algorithms directly within the Lakehouse, users can unlock the value of their data without first needing to move it to separate systems, thereby enhancing the efficiency and flexibility of data processing.