Open LinoVieira13 opened 11 months ago
Hi @LinoVieira13,
For vector 0.34.1 I cannot reproduce what you describe:
hoot:~$ which vector
/usr/bin/vector
host:~$ vector --version
vector 0.34.1 (x86_64-unknown-linux-gnu 86f1c22 2023-11-16 14:59:10.486846964)
host:~$ VECTOR_CONFIG_DIR=~/vector-config-dir/ vector
2023-12-08T10:21:15.402659Z INFO vector::app: Log level is enabled. level="vector=info,codec=info,vrl=info,file_source=info,tower_limit=info,rdkafka=info,buffers=info,lapin=info,kube=info"
2023-12-08T10:21:15.404428Z INFO vector::app: Loading configs. paths=["/home/bits/vector-config-dir"]
2023-12-08T10:21:15.407402Z INFO vector::topology::running: Running healthchecks.
2023-12-08T10:21:15.407471Z INFO vector: Vector has started. debug="false" version="0.34.1" arch="x86_64" revision="86f1c22 2023-11-16 14:59:10.486846964"
2023-12-08T10:21:15.407487Z INFO vector::app: API is disabled, enable by setting `api.enabled` to `true` and use commands like `vector top`.
2023-12-08T10:21:15.407646Z INFO vector::topology::builder: Healthcheck passed.
{"appname":"AnthraX","facility":"local2","hostname":"up.cn","message":"Take a breath, let it go, walk away","msgid":"ID52","procid":1840,"severity":"crit","timestamp":"2023-12-08T10:21:15.408Z","version":1}
{"appname":"b0rnc0nfused","facility":"alert","hostname":"some.xn--fzys8d69uvgm","message":"Great Scott! We're never gonna reach 88 mph with the flux capacitor in its current state!","msgid":"ID717","procid":5058,"severity":"crit","timestamp":"2023-12-08T10:21:16.408Z","version":2}
{"appname":"KarimMove","facility":"user","hostname":"names.ls","message":"You're not gonna believe what just happened","msgid":"ID94","procid":5582,"severity":"debug","timestamp":"2023-12-08T10:21:17.408Z","version":1}
^C2023-12-08T10:21:17.796252Z INFO vector::signal: Signal received. signal="SIGINT"
2023-12-08T10:21:17.796317Z INFO vector: Vector has stopped.
2023-12-08T10:21:17.797442Z INFO vector::topology::running: Shutting down... Waiting on running components. remaining_components="print, parse_logs, dummy_logs" time_remaining="59 seconds left"
{"appname":"jesseddy","facility":"local0","hostname":"we.associates","message":"Maybe we just shouldn't use computers","msgid":"ID559","procid":8294,"severity":"err","timestamp":"2023-12-08T10:21:18.408Z","version":2}
host:~$ VECTOR_CONFIG_DIR=~/vector-config-dir/ vector validate
√ Loaded ["/home/bits/vector-config-dir"]
x Could not create subdirectory "validate_tmp" inside of data dir "/var/lib/vector/": Permission denied (os error 13)
host:~$ sudo VECTOR_CONFIG_DIR=~/vector-config-dir/ vector validate
√ Loaded ["/home/bits/vector-config-dir"]
√ Component configuration
√ Health check "print"
-----------------------------------------
Validated
Is the Debug Output you shared above coming from vector 0.33.1
or vector 0.34.1
?
Hello @pront,
The Debug output I shared is coming from vector 0.34.1
. However, if I try the same in vector 0.33.1
the only difference in the debug output is the line 2023-1207T12:43:26.806507Z WARN vector::config::loading: The "/etc/vector/vector.toml" config path does not exist. Vector will attempt to use new default "/etc/vector/vector.yaml" instead.
the rest stays the same.
I had met the similar situation. The same point is that vector validate ...
does not quit if without --no-environment
in my case. But it doesn't appear all the time. When it occurs, I can see that the health checks are all passed but stuck without output the last line --- \n Validated
. Also, I noticed that the kafka connections still being kept (I'm using the kafka as both source and sink, maybe using kafka-sink would cause the problem more often). Btw, I cannot use ctrl+c
to quit the process. The same configurations are worked in the v0.32.1
@LinoVieira13 in order to help us debug this:
vector 0.33.1 (x86_64-unknown-linux-gnu ....)
, is that wrong?ls /etc/vector/conf.d
?I met the same bug. I tried to upgrade from 0.24.1 to 0.35.0 (aslo for latest 0.37) and it failes to start at PreStart stage. I tried with one vector.toml and one source + one sink + one file with 5-7 trfansforms and it succeeds. When I add more files and more sinks (about 50) validate stucks after √ Component configuration ~ Health checks are disabled
Started it with VECTOR_LOG=DEBUG with strace and it also stucks with my huge config, Ctrl+C can't stop it but it shows log messages INFO vector::signal: Signal received. signal="SIGINT"
. I only can exit with Ctrl+Z and then kill the proc by kill -9 vector_validate_pid
.
For same set of transforms run 'vector validate -C config_dir'
| Tried version | Result |
| 0.32.0-1 | OK |
| 0.33.1-1 | OK |
| 0.34.0-1 | FAIL |
| 0.34.1-1 | FAIL |
| 0.34.2-1 | FAIL |
| 0.35.0-1 | FAIL |
| 0.35.1-1 | FAIL |
| 0.36.0-1 | FAIL |
| 0.36.1-1 | FAIL |
| 0.37.0-1 | FAIL |
What I see with strace of vector validate
without --no-environment:
openat(AT_FDCWD, "/usr/lib/ssl/certs/ca-certificates.crt", O_RDONLY) = 33
fstat(33, {st_mode=S_IFREG|0644, st_size=208567, ...}) = 0
read(33, "-----BEGIN CERTIFICATE-----\nMIIH"..., 4096) = 4096
...<excluded>
read(33, "", 4096) = 0
close(33) = 0
write(4, "\1\0\0\0\0\0\0\0", 8) = 8
futex(0x7f05649fbd98, FUTEX_WAKE_PRIVATE, 1) = 1
write(1, "\33[32m\342\210\232\33[0m Component configura"..., 37) = 37
write(1, "\33[33m~\33[0m Health checks are dis"..., 38) = 38
futex(0x7f05695ebc50, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0x7f05695ebc00, FUTEX_WAKE_PRIVATE, 1) = 0
futex(0x7f0569460028, FUTEX_WAIT_BITSET_PRIVATE, 4294967295, NULL, FUTEX_BITSET_MATCH_ANY) = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
// ^^^^ it stucks here
//
// -- NOW 2 attempts to stop with Ctrl+C, then terminate by Ctrl-Z
--- SIGINT {si_signo=SIGINT, si_code=SI_KERNEL} ---
write(7, "\1", 1) = 1
rt_sigreturn({mask=[]}) = 202
futex(0x7f0569460028, FUTEX_WAIT_BITSET_PRIVATE, 4294967295, NULL, FUTEX_BITSET_MATCH_ANY) = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
--- SIGINT {si_signo=SIGINT, si_code=SI_KERNEL} ---
write(7, "\1", 1) = 1
rt_sigreturn({mask=[]}) = 202
futex(0x7f0569460028, FUTEX_WAIT_BITSET_PRIVATE, 4294967295, NULL, FUTEX_BITSET_MATCH_ANY) = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
--- SIGCONT {si_signo=SIGCONT, si_code=SI_USER, si_pid=200774, si_uid=1027} ---
futex(0x7f0569460028, FUTEX_WAIT_BITSET_PRIVATE, 4294967295, NULL, FUTEX_BITSET_MATCH_ANY) = ?
+++ killed by SIGKILL +++
Can someone experiencing this failure share the Vector configuration they are trying to load and exact command they are running?
Forgot to mention - I run vector under Ubuntu 20.04.04 LTS x64. Vector installed by apt, versions are x86_64-unknown-linux-gnu
Can't share my config files right now. I'll try to split and test it, i think It might be an issue due to the files count, or a topolgy issue
Anybody with the problem on stuck validate
, can you try to add a delay on the network interface following https://github.com/vectordotdev/vector/issues/20367#issuecomment-2075069444 and run validate
again ? :exploding_head:
A note for the community
Problem
Hello, I'm opening this bug at the request of jszwedko via discord. So, I'm using an Ubuntu 22.04 machine, and I have vector v0.33.1-1 already installed and working, however, when I try to upgrade the package via Ubuntu APT to version v.0.34.1-1 it fails to start after 1m30s (timeout). I don't have a /etc/vector/vector.yaml file, because I'm using environment variables to point the VECTOR_CONFIG_DIR to this folder: /etc/vector/conf.d. This is the error when it tries to start the service after the upgrade:
Configuration
Version
vector 0.33.1 (x86_64-unknown-linux-gnu 3cc27b9 2023-10-30 16:50:49.747931844)
Debug Output
Example Data
No response
Additional Context
Vector machine is an LXC host.
References
No response