Open AndrewChubatiuk opened 1 year ago
Hey @AndrewChubatiuk does this happen immediately when Vector is starting or does it take some time for it to surface?
It appears suddenly and not on all nodes, haven't found any correlation yet
Thanks, I'll see what I can do to reproduce this.
@AndrewChubatiuk Are you able to collect full backtrace with RUST_BACKTRACE=1
env variable for Vector? It could help with a reproduction of the issue. Thanks in advance.
I guess the problem is here: https://github.com/vectordotdev/vector/blob/master/src/internal_telemetry/allocations/mod.rs#L54
For tracking source_group_id
u8
type is used, and the max value is bigger than 128. Probably, we need to limit source_group_id
or just allocate more memory.
I haven't specifically started looking into this but did see similar messages when shutting down Vector while reproducing another report.
another problem regarding this issue is that vector's health endpoint keeps responding with HTTP 200
another problem regarding this issue is that vector's health endpoint keeps responding with HTTP 200
That's tracked by https://github.com/vectordotdev/vector/issues/4250, the check is very naive today.
Agree with @zamazan4ik's thoughts after a quick review, happy to take a contribution if someone's interested!
Seems to be the same as https://github.com/vectordotdev/vector/issues/16028 which we thought was fixed, but seemingly there is still a case that hits it.
This issue still happens in 0.38.0
thread 'vector-worker' panicked at src/internal_telemetry/allocations/mod.rs:111:13:
index out of bounds: the len is 128 but the index is 192
stack backtrace:
0: rust_begin_unwind
at ./rustc/25ef9e3d85d934b27d9dada2f9dd52b1dc63bb04/library/std/src/panicking.rs:647:5
1: core::panicking::panic_fmt
at ./rustc/25ef9e3d85d934b27d9dada2f9dd52b1dc63bb04/library/core/src/panicking.rs:72:14
2: core::panicking::panic_bounds_check
at ./rustc/25ef9e3d85d934b27d9dada2f9dd52b1dc63bb04/library/core/src/panicking.rs:208:5
3: <vector::internal_telemetry::allocations::allocator::tracing_allocator::GroupedTraceableAllocator<A,T> as core::alloc::global::GlobalAlloc>::dealloc
4: core::ptr::drop_in_place<metrics::key::Key>
5: core::ops::function::FnOnce::call_once
6: vector_core::metrics::Controller::capture_metrics
7: vector::sources::internal_metrics::InternalMetrics::run::{{closure}}
8: vector::topology::builder::Builder::build_sources::{{closure}}::{{closure}}
9: <tracing::instrument::Instrumented<T> as core::future::future::Future>::poll
10: tokio::runtime::task::raw::poll
11: tokio::runtime::scheduler::multi_thread::worker::Context::run_task
12: tokio::runtime::task::raw::poll
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
2024-05-22T09:06:09.443337Z ERROR source{component_kind="source" component_id=internal_metrics component_type=internal_metrics}: vector::topology: An error occurred that Vector couldn't handle: the task panicked and was aborted.
A note for the community
No response
Problem
Vector 0.29.1 fails with stacktrace, but pod is healthy
Configuration
Version
0.29.1
Debug Output
No response
Example Data
No response
Additional Context
No response
References
No response