mediative / eigenflow

ETL orchestration platform with recoverability and process monitoring features
https://mediative.github.io/eigenflow/
Apache License 2.0
9 stars 4 forks source link

Eigenflow

Eigenflow is an orchestration platform for building resilient and scalable data pipelines.

Pipelines can be split into multiple process stages which are persisted, resumed and monitored automatically.

Build Status Latest Version

Quick example:

case object Download extends ProcessStage
case object Transform extends ProcessStage
case object Analyze extends ProcessStage
case object SendReport extends ProcessStage

val download = Download {
  downloadReport() // returns file path/url to downloaded report
} retry (1.minute, 10) // in case of error retry every minute 10 times before failing

val transform = Transform { reportFile =>
  buildParquetFile(reportFile) // returns file path/url
}

val analyze = Analyze { parquetFile =>
  callSparkToAnalyze(parquetFile) // returns new report file path/url
}

val sendReport = SendReport { newReportFile =>
  sendReportToDashboard(newReportFile)
}

override def executionPlan = download ~> transform ~> analyze ~> sendReport

Once the stage methods (downloadReport, buildParquetFile etc in the example above) are defined the rest is done automatically: see complete list of features.

What it is good for

Eigenflow was created for managing periodic long-running ETL processes with automatic recovery of failures. When stages performance is important and there is a need to collect statistics and monitor processes.

What it may not be good for

Eigenflow is a platform somewhere between "simple cron jobs" and complex enterprise processes, where an ESB software would usually be used. Thus, it probably should not be considered for primitive jobs and very complex processes where SOA is involved.

Main Features

Custom monitoring and notification systems can be developed. Messages are pushed to a message queue (Kafka is supported out of the box) and can be consumed by a message queue consumer.

Note: there is no connectors to 3rd party systems out of the box.

System Requirements

Runtime

Development