Open trexx opened 1 year ago
@trexx So before you deleted the disk buffers, Vector was in a state where even after restarting it, it would never accept new connections?
Looking at the code, I can see a potential issue with it that could lead to behavior where Vector would appear stalled, which I'm going to try and reproduce locally. If you still had any of the debug logging that you could share, that would be helpful, too.
@tobz Correct. There were many container restarts to apply a console sink to some of the sources to verify whether vector was receiving traffic. Prometheus also complained about being unable to scrape its metrics target too.
2022-10-06T13:52:20.069032Z INFO vector::app: Log level is enabled. level="info"
2022-10-06T13:52:20.069123Z INFO vector::config::watcher: Creating configuration file watcher.
2022-10-06T13:52:20.072937Z INFO vector::config::watcher: Watching configuration files.
2022-10-06T13:52:20.073476Z INFO vector::app: Loading configs. paths=["/etc/vector"]
These were the only messages we saw before enabling debug mode.
Unfortunately I don't have any of the debug logging to share. I can only recall "Current data file has no more data." being spewed. If there was anything else, it may have been at the start when vector was initialising. Sorry!
No worries. 👍🏻
I haven't started working on a local reproduction, but if that message was spewing across the console, it likely confirms what I suspect the problem is... so that's still a useful datapoint.
A note for the community
Problem
Had an issue where Vector wasn't accepting connections but it wasn't spitting out anything in the logs either aside from the usual info level logs. It looks like nothing was wrong but it wasn't accepting traffic. We initially suspected something to do with the network. Once we enabled debug logging we started seeing a message to do with the disk buffer. We were reaching this line here. https://github.com/vectordotdev/vector/blob/v0.24.1/lib/vector-buffers/src/variants/disk_v2/reader.rs#L1054
It maybe worthwhile changing the log level this line logs at, perhaps warn to make it apparent that something is preventing Vector from functioning properly.
We eventually deleted our persistent volumes to start from scratch and that got it back to life again.
Configuration
No response
Version
0.24.1
Debug Output
No response
Example Data
No response
Additional Context
No response
References
No response