lmco / streamflow

StreamFlow™ is a stream processing tool designed to help build and monitor processing workflows.
https://github.com/lmco/streamflow/wiki
Apache License 2.0
252 stars 69 forks source link

Flink #42

Open jloveland opened 8 years ago

jloveland commented 8 years ago

Apache Flink is quickly gaining momentum as an alternative to Spark Streaming, Storm, etc.

Apache Flink is an open source platform for distributed stream and batch data processing. Flink’s core is a streaming dataflow engine that provides data distribution, communication, and fault tolerance for distributed computations over data streams.

What are your thoughts on developing a plugin for Flink Streaming in StreamFlow? The rationale is that Flink provides a Storm compatible API:

Flink provides a Storm compatible API (org.apache.flink.storm.api) that offers replacements for the following classes:

TopologyBuilder replaced by FlinkTopologyBuilder
StormSubmitter replaced by FlinkSubmitter
NimbusClient and Client replaced by FlinkClient
LocalCluster replaced by FlinkLocalCluster

In order to submit a Storm topology to Flink, it is sufficient to replace the used Storm classes with their Flink replacements in the Storm client code that assembles the topology. The actual runtime code, ie, Spouts and Bolts, can be uses unmodified. If a topology is executed in a remote cluster, parameters nimbus.host and nimbus.thrift.port are used as jobmanger.rpc.address and jobmanger.rpc.port, respectively. If a parameter is not specified, the value is taken from flink-conf.yaml.

christopherlakey commented 8 years ago

StreamFlow now uses an external process for deploying a StreamFlow topology to a Storm cluster. It should be relatively straight forward to implement an alternate submitter. More changes will likely be required to provide the hooks for topology status and metrics.

What's the primary motivation for Flink integration? Performance?

I saw the word-count performance comparison, but it was comparing a Storm topology to a Flink DSL approach. Are there any performance comparisons of running an unmodified Storm topology in Flink vs natively in a Storm cluster?