snowplow / snowbridge

For replicating streams across clouds, accounts and regions
Other
14 stars 8 forks source link

Use existing timestamp for request finish in measuring latency metrics #364

Closed colmsnowplow closed 1 week ago

colmsnowplow commented 1 week ago

Changes in 2.4.2 to how throttling is handled in kinesis would lead to misreporting metrics in cases where we're reporting batches.

Because we take a timestamp for when we provide the WriteResult object, if we have a batch with 499 events succeed within 10ms, but one gets throttled and takes 1s before a retry is successful, then 499 events will report 1+s latency where they should report 10ms.

However we already record a timestamp fro when a request finished - this PR changes the metrics to compute latency based on that timestamp instead.

This will likely make little to no difference to our metrics in prod because we usually don't batch the data - but that will change, so it's worth fixing now regardless.