snowplow / snowbridge

For replicating streams across clouds, accounts and regions
Other
15 stars 7 forks source link

Reconsider format and generally tags that are sent to StatsD #86

Closed jbeemster closed 1 year ago

jbeemster commented 2 years ago

We ran into a fairly obscure issue with Stream Replicator + Container Monitor where it failed to work when configured with a Kafka Target. The issue stems from how the target_id comes back from Stream Replicator:

https://github.com/snowplow-devops/stream-replicator/blob/master/pkg/target/kafka.go#L280-L283

This comes back as a string like this:

2021-12-20 13:28:09,288 statsd_parser DEBUG: raw_metric: snowplow.stream-replicator.latency_proccesing_max:0|ms|#client_name:com_starsgroup,app_version:0.5.0,target_id:brokers:10.30.78.42:9095,10.30.78.42:9096,10.30.78.42:9097:topic:snowplow-qa,failure_target_id:arn:aws:kinesis:eu-west-2:883031630176:stream/snowplow-com-starsgroup-qa1-bad-1,hostname:ip-10-174-229-90.eu-west-2.compute.internal,process_id:1,source_id:arn:aws:kinesis:eu-west-2:883031630176:stream/snowplow-com-starsgroup-qa1-enriched

Specifically: brokers:10.30.78.42:9095,10.30.78.42:9096,10.30.78.42:9097:topic:snowplow-qa

StatsD expects tags in this form: #tag1:value1,tag2:value2

Long and short of it is that the Kafka GetID() function breaks this contract which causes confusion in the StatsD parser and was causing duplicate key issues where the tag set that was coming out here was:

brokers - 10.30.78.42:9095
10.30.78.42 - 9096
10.30.78.42 - 9097:topic:snowplow-qa

The duplicate key of 10.30.78.42 caused the Cloudwatch API to panic as it cannot accept duplicate dimensions.


We need to ensure that all tags sent to StatsD are valid and follow the pattern of key:value,key1:value1 and that we can reliably split around "," signs.

jbeemster commented 2 years ago

Linking this one here: https://github.com/snowplow-devops/stream-replicator/issues/88

I have gone ahead and removed the default tags to remove this issue - it also didn't add anything useful to the metric so not real point having it there!

colmsnowplow commented 1 year ago

Closing this as removing tags resolved the issue. However I'm working on different statsd metrics reporting based on our infra team's requirements for monitoring and alerting. I feel like that's the best driver for this kind of feature regardless!