Closed dloucasfx closed 1 year ago
@rmfitzpatrick was reviewing this PR when we checked other lib, Java and Ruby follows the same logic https://github.com/signalfx/signalfx-java/blob/main/signalfx-java/src/main/java/com/signalfx/signalflow/Computation.java#L272 https://github.com/signalfx/signalfx-ruby/blob/master/lib/signalfx/signalflow/computation.rb#L149
but not GO I wonder if this is a server side issue
Issue is caused by server side change, closing PR
The logic here expects that batched data events (ie: data messages with same timestamp) come right after another, otherwise the logic will break. The issue occurs when the server sends events out of order; for example, the below series of events, which randomly happens, is messing up the logic.
If I understand this correctly, the
yield
in the info event here should “and meant to” only occur once, after the first data event batch is built.However, when the logic process the events [data1-Time1]->[info]->[info]->[data2-Time2] , the yield in info is being triggered on a single data message [data1-Time1] causing _current_batch_count to get reset from 1 to 0 in
_get_batch_to_yield
. Next, when [data2-Time2] gets processed, the_current_batch_count
is incremented to 1 but because there is no more events with the same timestamps,_current_batch_count
will always be 1 and we never meet this condition as_expected_batches
is determined to be 2.The fix here adds an extra check in the info to make sure we only
yield
when the batch is completecc: @mahdibh @mpetazzoni
UPDATE: This solution assumes we always get data for the full batch; I think it will fail if we don't as the added condition will not be met and we'll never yield. I am not sure if this a valid scenario and how to address it. I was hoping that the info message will have a hint in its
message code
to signal that the data event is missing at a particular timestamp, but I don't see any hereSigned-off-by: Dani Louca dlouca@splunk.com