Open pjay-shopify opened 1 year ago
FWIW,
I've also observed some correlation between occasional StreamingPullResponses returning with the Unavailable
status code, and the growth of the oldest unacked message age
metric around the same time.
Below is a chart presenting that:
Unavailable
responses occurred.oldest unacked message age
metric.In theory, if these two things are related we should always see the yellow line overlapping with the spike of the green line. However, these metrics are being sampled by Google and if the number of Unavailable
responses is relatively low, there's a high chance of missing them for some intervals.
Thanks for reporting this @pjay-shopify !
Adding some notes from the discord support thread discussing this:
Unavailable
status code being returned may be a hint that the issue is with the source reconnecting.@pjay-shopify Were you able to resolve the issue with any workaround ? Thank you
A note for the community
Problem
We're trying to employ Vector in our logging pipeline. All of our logs are being sent to a PubSub topic and then processed by Vector. However, there's a strange issue we've been experiencing... Some messages are being picked up and never acknowledged. The number of such messages is relatively low (5-10 every 20-30 minutes) but the pattern is worrying π€ For troubleshooting, I've reduced my config to the following (the issue still persists):
I'm attaching below two charts showing our PubSub metrics: number of sent messages (to give you a better understanding of our traffic pattern) and the oldest unacked message metric (that grows to 10 minutes every 20-30 minutes). 10 minutes is our ack deadline:
We also mirror the traffic to our Grafana agent so it processes the same set of messages but we don't have any ack issues with that so it seems like there might be an issue on the Vector's end.
Configuration
Version
0.24.2
Debug Output
No response
Example Data
No response
Additional Context
No response
References
No response