snowplow / snowbridge

For replicating streams across clouds, accounts and regions
Other
15 stars 7 forks source link

Redesign filtered data reporting #154

Open colmsnowplow opened 2 years ago

colmsnowplow commented 2 years ago

Two issues:

In transform.go, we don't report timeTransformed for filtered messages.

The reason for this was to avoid misreporting unfiltered data as faster than it actually was, by combining it with filtered data. However this was a mistake, since filtered data is handled separately anyway, so there's no risk of that.

Additionally, we report the timestamp for filtered messages only at the end of the process, when we create a filterResult.

This means that our reporting of transform latency is defined as the time between message pulled and the completion of that event's transformation, but the filter latency is the delta between message pulled and the completion of the batches transformations.

We should change this so that those two timestamps represent similar things (filters are transformations under our current mode), and also rethink the design of this reporting, and consider including filtered messages in the TargetWriteResult rather than separately from that.