Open freak12techno opened 3 weeks ago
Hi @freak12techno ,
Thanks for filing this issue. I'm a little confused though, when Vector is offline, it won't be able to collect metrics via the host_metrics
and internal_metrics
sources as both of those sources are "realtime" so I think what you are seeing is expected behavior. Am I missing something? 🤔
A note for the community
Problem
We are planning to integrate Vector into one of our projects. Our idea is to have a architecture where there are multiple servers, which all are sending data to a server Vector, which is sending data to Prometheus remote write. Problem is, if a machine Vector agent is running on is offline for some period of time (and it's the often case for us), Vector would lose some metrics. This happens almost all the times a machine Vector agent is running at is losing its internet access. Example:
Here I've disabled WiFi on my laptop I am running my agent on from 18:17 to 12:55, and there's a gap in all metrics between ~19:57 and ~12:57, so it effectively lost almost all the metrics.
I tried writing to Prometheus remote write directly from agent instead of writing it to server Vector, and it yielded the same result, so it doesn't seem like server Vector is the issue, it seems like there's some problem with Vector agent either not collecting metrics once it's offline, or sending it in a wrong way so they are not recorded in Prometheus.
This is critical for us, and we wonder if it's us who misconfigured something, or is there some kind of a bug in Vector agent that causes this.
Configuration
Version
FROM timberio/vector:0.41.1-alpine
Debug Output
Once the internet is out, here's what happening in logs (a lot of repeated messages like this):
Then, once a machine is back online, a lot of repeated messages like these: https://gist.github.com/freak12techno/a79d04e226d7e33819162a6da76cb144
Example Data
No response
Additional Context
No response
References
No response