ryan-mars / stochastic

TypeScript framework for building event-driven services. Easily go from Event Storming → Code.
MIT License
6 stars 1 forks source link

feat(data-lake): add scaffolding for stochastic-flink and DataLake Stack #103

Open sam-goodwin opened 3 years ago

sam-goodwin commented 3 years ago

Closes #84 Closes #95

This change adds a DataLake Stack that persists all observed Domain Events, Issued Commands and Store state changes in a Bounded Context. The data is collected into a single Kinesis Stream and processed by an Apache Flink application running on an Kinesis Analytics managed Flink cluster. This application consumes from the stream, partitions the data by type and time and stores the data in S3 as encrypted JSON and Parquet files. These partitions are then updated in the corresponding AWS Glue Tables so that they can be queried in Athena, Spark and Hadoop (or any other Hive-compatible consumer). Data can also be configured to be loaded into an AWS Timestream instance to enable fast time-stream analysis.

TODO:

ryan-mars commented 3 years ago

Is Data Lake per Bounded Context an ideal pattern? Shouldn't there be one (few) Data Lake(s) in an org all fed by the various Bounded Contexts?

Why might it make sense to have one DL per BC?