Open MedAnd opened 5 years ago
This has been under active consideration for quite some time now. Part of the issue is that Beam's model and Trill's are substantially different in several ways, enough that it would almost require a redesign or reimplementation of Trill into something far more DataFlow-like to do, which would eliminate most of what Trill does really well. That said, there is always the possibility that we could find some innovative way to support both.
Another issue has been that we've seen other implementations of Beam have remarkably bad performance over their native implementations, enough that they tend to lose interest quickly.
I'd love to get a conversation going on this though. What I would like to know, if possible, is:
Some initial feedback...
Think this article is applicable: Why Apache Beam? A Google Perspective
A further discussion stimulator... Batch as a Special Case of Streaming
This is definitely good conversation fodder, and thank you. I think the biggest question here is if we go forward with a Beam API layer, where in the architecture would it sit? My immediate thought is that it would be atop Trill and not inside it, but that is certainly debatable.
As for batch as a special case of streaming, you've got no argument from me there. :-)
I would use either implementation in a large stream processing application if available today ☺️ I think the functionality offered by Azure Stream Analytics (ASA) is compelling, however on the distributed stream processing side I believe Azure does not have a true equivalent to Google Dataflow? Hope this project can change that... more conversation fodder to come ☺️
Consider adding support for Apache Beam's unified model for defining both batch and streaming data-parallel processing pipelines.