njfritter / poc-data-pipelines

Proof-of-Concept (POC) Data Pipelines for various use cases such as data streaming/ingestion, batch data processing, orchestration and storage. Includes technologies such as Apache Airflow, Apache Spark, Apache Kafka, AWS, Python and more
0 stars 0 forks source link

Add Batch Layer Piece of Data Pipeline #4

Open njfritter opened 9 months ago

njfritter commented 9 months ago

Create the batch layer of data written to Kafka in a separate persisted data store.

Given that I plan on using Snowflake for batch data pipelines down the line, I may just use Snowflake and write to a table separate from any later tables (and would distinguish this table in a separate schema from the batch tables).

Edit: For the sake of doing a local implementation, I will use a Postgres DB which can be setup locally (rather than a cloud based solution like Snowflake/Redshift/etc.)

njfritter commented 9 months ago

S3 could also be an option here depending on cost

njfritter commented 8 months ago

This ticket will include the following: