practo / tipoca-stream

Near real time cloud native data pipeline in AWS (CDC+Sink). Hosts code for RedshiftSink. RDS to RedshiftSink Pipeline with masking and reloading support.
https://towardsdatascience.com/open-sourcing-tipoca-stream-f261cdcc3a13
Apache License 2.0
47 stars 5 forks source link
cdc data kafka realtime redshift

tipoca-stream

CI Status


A near realtime cloud native data pipeline using Kafka, KafkaConnect, and RedshiftSink in AWS. RedshiftSink is a high performance, low overhead data loader for Redshift, open-sourced by Practo. It comes with a rich data masking support so you can create a universal data access in your organization while preserving your customer's privacy!

Release blog.

Tipoca Stream is a successor to an internal non-realtime datawarehousing project called Tipoca, which itself derives its name from Tipoca City - home of the Clones in the Star Wars universe.

Install

The pipeline is a combination of services deployed independently. This repo holds the code for the redshiftsink only.

The project has pluggable libraries which can be composed to solve any other data pipeline use case.

Contribute

Please follow this to bring a change.

Thanks