Closed Ohyoukillkenny closed 4 years ago
FlushPolicy.None (or "egressed normally") means that every operator will independently buffer events until the number of buffered outputs reaches the batch size (Config.DataBatchSize), at which point the operator will egress that batch/pass it to the next downstream operator. Since any given operator may egress a different number of events from what it receives (e.g., Where operator may egress fewer events, or SelectMany may egress more events), operators will egress at different times. A flush differs from this behavior, as a flush is propagated throughout the entire query, so every operator will egress all output events that are ready, regardless of whether the batch is full or not. So, FlushPolicy.FlushOnBatchBoundary will wait for the ingress batch to reach the max size (or batch boundary), then flush the entire pipeline from that ingress site. So if you have ingress followed by a single Where operator that reduces the number of output events, those events will now be egressed through to the output even though the Where operator's output buffer was not full, whereas under FlushPolicy.None, the Where operator would wait until ingress passes the next batch(es) until the buffered output reaches the batch size. FlushPolicy.FlushOnPunctuation will flush the query in response to any punctuation.
Thanks a lot! I get the idea. Thanks again for your time and your discussion.
From Trill's documentation about its
FlushPolicy
, I found:I have two questions with regard to the above description. First of all, why data is always flushed when the batch is full in practice? Is the
BatchBoundary
the size of the batch? In practice, I set the input stream of type{payload:long, startTime:long, endTime:long}
asThen, I set the size of the batch as 2:
Also, every time when the input is produced, I print out it on its emission:
Then, after I set the FlushPolicy as FlushOnPunctuation when I aggregate the sum of payloads, I expect to see the flush of data only at the end of the stream. However, what I observed was that the data still got flushed when the batch was full. Here is my executing code:
And here is the cmd outputs:
It looks like
FlushOnPunctuation
has the same functionality asFlushOnBatchBoundary
.By the way, the second question is when the flush policy is
None
that "output events will be batched and egressed normally", what does "normally" mean exactly? When I set the flush policy to beNone
in the above code, I observed exactly the same cmd outputs. I am very confused by the difference between these FlushPolicies.Could anyone help me to understand these policies? I will really appreciate it!!