infraspecdev / infraspec.dev

GNU Affero General Public License v3.0
0 stars 0 forks source link

[SPIKE]: Newtap: Create RDS to Data lake pipeline with Databrick's tools #36

Closed nitishInfraspec closed 5 months ago

nitishInfraspec commented 5 months ago

Story Details

Create a Data pipeline to ingest the data and CDC event from the RDS to the Data Lake using Delta lake

Solution

We are creating Data pipeline that will do following sequence:

  1. Read Data from RDS to s3 data lake as delta lake format.
  2. We will create task to update the data lake with CDC changes from RDS.
  3. Provide Spark sql/DeltaTable object for the query interface.

Implementation

Below link contains the implementation steps of the proposed solution https://infraspec.getoutline.com/doc/rds-to-data-lake-pipeline-using-delta-lake-table-d3AxapXoeH

Acceptance Criteria