There I print informations when the methods of a summarizer are called, such as add, merge, etc.
I found that, if I call TimeSeriesRDD.summarize(), then the subtract will not be called, and if I have multiple partitions, the merge will be called. which means, aggregation on all rows will run in parallel and eventually merged into final result.
But if I run the summarizer with TimeSeriesRDD.summarizeWindow(), then for the window aggregation, add and subtract will be called, but not merge. Which means, inside one window, it is not parallel.
Am I right? This knowlege will be very helpful for my implementation to our problem, which is an extension of that experimental code.
I created an experimental summarizer to try to understand how it works: https://github.com/soloman817/flint/blob/feature/experiment/src/test/scala/com/twosigma/flint/timeseries/experiment/AccumulateSummarizerSpec.scala
There I print informations when the methods of a summarizer are called, such as add, merge, etc.
I found that, if I call
TimeSeriesRDD.summarize()
, then thesubtract
will not be called, and if I have multiple partitions, themerge
will be called. which means, aggregation on all rows will run in parallel and eventually merged into final result.But if I run the summarizer with
TimeSeriesRDD.summarizeWindow()
, then for the window aggregation,add
andsubtract
will be called, but notmerge
. Which means, inside one window, it is not parallel.Am I right? This knowlege will be very helpful for my implementation to our problem, which is an extension of that experimental code.
Thanks, Xiang.