Proof-of-Concept (POC) Data Pipelines for various use cases such as data streaming/ingestion, batch data processing, orchestration and storage. Includes technologies such as Apache Airflow, Apache Spark, Apache Kafka, AWS, Python and more
Create the batch layer of data written to Kafka in a separate persisted data store.
Given that I plan on using Snowflake for batch data pipelines down the line, I may just use Snowflake and write to a table separate from any later tables (and would distinguish this table in a separate schema from the batch tables).
Edit: For the sake of doing a local implementation, I will use a Postgres DB which can be setup locally (rather than a cloud based solution like Snowflake/Redshift/etc.)
Create the batch layer of data written to Kafka in a separate persisted data store.
Given that I plan on using Snowflake for batch data pipelines down the line, I may just use Snowflake and write to a table separate from any later tables (and would distinguish this table in a separate schema from the batch tables).
Edit: For the sake of doing a local implementation, I will use a Postgres DB which can be setup locally (rather than a cloud based solution like Snowflake/Redshift/etc.)