twitter / summingbird

Streaming MapReduce with Scalding and Storm
https://twitter.com/summingbird
Apache License 2.0
2.14k stars 267 forks source link

Queue AsyncSummer results in AggregatorOutputCollector #698

Closed jnievelt closed 7 years ago

jnievelt commented 7 years ago

Initial versions of this did use FutureQueue, but I eventually realized that interface isn't really appropriate for the Spout.

As a little background, with #685 we have a subtle change to the way that AsyncBase handles the Future[TraversableOnce[(Seq[S], Future[TraversableOnce[O]])]], in that the outer Future no longer counts against maxWaitingFutures. I don't think this is a problem per se, because the outer Future is the one that handles summing from the AsyncSummer.

The inner Futures, on the other hand, exist for the sake of the Summer bolt, which does a Mergeable#multiMerge on the data after it's aggregated. Only these Futures count against the maxWaitingFutures, which makes sense, because the spirit of that setting is to bound the parallelism of the calls down into the underlying store.

Now if we consider FinalFlatMap, the inner Futures will always be a ConstFuture because instead of calling down into a Mergeable, we just wrap the result via Future.value. In theory the FinalFlatMap could use a simple Queue, but that would require some reworking to AsyncBase, etc.

In any event, because AsyncBase anyway isn't in play for AggregatorOutputCollector, I'm avoiding using the nested Future scheme all together. The AsyncSummer results will simply have an onSuccess which queues the results for emitting. Then it's largely a matter of replacing the Await.result with polling on the Queue.

johnynek commented 7 years ago

Seems like a good change to me.

👍 when tests pass (didn't look at the failure).