Wrap Flush in a mutex. In general this is called only in the flusher goroutine; however, if flush on shutdown is enabled we could prematurely shutdown veneur without waiting for the Flush to complete.
Motivation
Discovered dropped metrics for short lived services and veneur running as a sidecar.
Test plan
I have automated tests in a different branch. Requires either introducing a delayed blackhole sink, or introducing mocks which adds additional vendored deps. The test asserts order of flushes while using different delays during the Flush call.
Additionally, we have two testbeds to validate this behaviour:
AWS Lambdas - Running veneur in a Lambda Layer to collect and flush metrics. On single invocation lambdas we have noticed that metrics are dropped on occasion. We can hide some of this behaviour by reducing the flush interval, but would rather fix the root cause.
AWS EKS - Running short lived pods w/ veneur as a sidecar. When application finishes call /quitquitquit and notice that metrics do not flush.
Rollout/monitoring/revert plan
Testing in EKS first followed by Lambda both in dev/staging first. Will update
Summary
Wrap Flush in a mutex. In general this is called only in the flusher goroutine; however, if flush on shutdown is enabled we could prematurely shutdown veneur without waiting for the Flush to complete.
Motivation
Discovered dropped metrics for short lived services and veneur running as a sidecar.
Test plan
I have automated tests in a different branch. Requires either introducing a delayed blackhole sink, or introducing mocks which adds additional vendored deps. The test asserts order of flushes while using different delays during the Flush call.
Additionally, we have two testbeds to validate this behaviour:
Rollout/monitoring/revert plan