Open kaarolch opened 7 months ago
Hi @kaarolch !
I think the issue here is that you want to disable Vector's default logging to stderr while still capturing debug level logs via internal_logs
. Is that correct? If so, I would suggest routing Vector's stderr to /dev/null
and then just relying on internal_logs
to capture the logs so that you can redact them before printing them to the console.
Separately https://github.com/vectordotdev/vector/issues/16609 is tracking creating some more definite development guidelines around Vector's logging to avoid accidentally logging sensitive information.
Let me know if that's clear!
@jszwedko unfortunately redirect error logs to /dev/null
did not hide DEBUG span.
/bin/bash -c "vector --graceful-shutdown-limit-secs 300 --internal-log-rate-limit 10 --config-dir /vector --log-format json 2>/dev/null"
I still get DEBUG span events with dd-api-key
to console:
{"timestamp":"2024-03-11T08:29:24.200943Z","level":"DEBUG","message":"Sending HTTP request.","uri":"https://pvtlink.agent.datadoghq.com:443//api/v1/series","method":"POST","version":"HTTP/1.1","headers":"{\"dd-api-key\": \β******\β, \"dd-agent-payload\": \"4.87.0\", \"content-type\": \"application/json\", \"content-encoding\": \"deflate\", \"user-agent\": \"Vector/0.34.2 (aarch64-unknown-linux-gnu d685a16 2024-01-02 14:59:54.890517436)\", \"accept-encoding\": \"identity\"}","body":"[141074 bytes]","target":"vector::internal_events::http_client","span":{"name":"http"},"spans":[{"component_id":"datadog_agent_dogstatsd_metrics","component_kind":"sink","component_type":"datadog_metrics","name":"sink"},{"request_id":426,"name":"request"},{"name":"http"}]}
Also I would prefer to not redirect vector stdout/stderr because if vector failed or report problem during start we will miss related events.
Separately https://github.com/vectordotdev/vector/issues/16609 is tracking creating some more definite development guidelines around Vector's logging to avoid accidentally logging sensitive information.
TBH there should be option to enable debug logs and filter sensitive keys from it. Would be nice if internal_logs
source could be set in debug but other vector parts wont throw debug events direct to stdout. This will help to create custom vrl filters.
That debug span there appears to be output from the console
sink which indicates an issue with your configuration in that it isn't dropping the dd-api-key
field.
Also I would prefer to not redirect vector stdout/stderr because if vector failed or report problem during start we will miss related events.
True, that makes sense. Unfortunately I'm not seeing another good method just using Vector currently. You could consider feeding the output through another process to do redaction.
Separately https://github.com/vectordotdev/vector/issues/16609 is tracking creating some more definite development guidelines around Vector's logging to avoid accidentally logging sensitive information.
TBH there should be option to enable debug logs and filter sensitive keys from it. Would be nice if internal_logs source could be set in debug but other vector parts wont throw debug events direct to stdout. This will help to create custom vrl filters.
Agreed, that's actually being tracked by https://github.com/vectordotdev/vector/issues/12399
I think the request here is covered by https://github.com/vectordotdev/vector/issues/12399 so I'll close this one in-lieu of that one but let me know if you disagree! Also feel free to leave additional thoughts on that other issue.
@jszwedko
That debug span there appears to be output from the console sink which indicates an issue with your configuration in that it isn't dropping the dd-api-key field.
I did some tests:
console
sink, redirect them to blackhole
. Currently I'm 100% sure that only:
sinks:
vector_internal_debug_logs_stdout:
type: console
inputs:
- internal_debug_logs_del_key
encoding:
codec: json
sent to console. During my test I've exec to container and run tap on input of this sink with grep api-key
:
vector tap --inputs-of vector_internal_debug_logs_stdout | grep api-key
2024-03-12T08:01:34.052754Z DEBUG vector::app: Internal log rate limit configured. internal_log_rate_secs=10
2024-03-12T08:01:34.052799Z INFO vector::app: Log level is enabled. level="debug"
2024-03-12T08:01:34.052958Z DEBUG vector::app: messaged="Building runtime." worker_threads=2
2024-03-12T08:01:34.065476Z DEBUG reqwest::connect: starting new connection: http://127.0.0.1:8686/
2024-03-12T08:01:34.065532Z DEBUG hyper::client::connect::http: connecting to 127.0.0.1:8686
2024-03-12T08:01:34.065681Z DEBUG hyper::client::connect::http: connected to 127.0.0.1:8686
2024-03-12T08:01:34.065911Z DEBUG hyper::proto::h1::io: flushed 206 bytes
2024-03-12T08:01:34.066233Z DEBUG hyper::proto::h1::io: parsed 3 headers
2024-03-12T08:01:34.066265Z DEBUG hyper::proto::h1::conn: incoming body is content-length (24 bytes)
2024-03-12T08:01:34.066299Z DEBUG hyper::proto::h1::conn: incoming body completed
2024-03-12T08:01:34.066337Z DEBUG hyper::client::pool: pooling idle connection for ("http", 127.0.0.1:8686)
2024-03-12T08:01:34.073158Z DEBUG tungstenite::handshake::client: Client handshake done.
[tap] Pattern 'vector_internal_debug_logs_stdout' successfully matched.
During 10 mins tap there was no single events similar to:
{"timestamp":"2024-03-11T08:29:24.200943Z","level":"DEBUG","message":"Sending HTTP request.","uri":"https://pvtlink.agent.datadoghq.com:443//api/v1/series","method":"POST","version":"HTTP/1.1","headers":"{\"dd-api-key\": \β******\β, \"dd-agent-payload\": \"4.87.0\", \"content-type\": \"application/json\", \"content-encoding\": \"deflate\", \"user-agent\": \"Vector/0.34.2 (aarch64-unknown-linux-gnu d685a16 2024-01-02 14:59:54.890517436)\", \"accept-encoding\": \"identity\"}","body":"[141074 bytes]","target":"vector::internal_events::http_client","span":{"name":"http"},"spans":[{"component_id":"datadog_agent_dogstatsd_metrics","component_kind":"sink","component_type":"datadog_metrics","name":"sink"},{"request_id":426,"name":"request"},{"name":"http"}]}
I've also tried to tap to output of internal_logs
source and find any events without metadata:
vector$ vector tap --outputs-of vector_internal_logs | grep -v metadata
2024-03-12T08:06:35.184941Z DEBUG vector::app: Internal log rate limit configured. internal_log_rate_secs=10
2024-03-12T08:06:35.184981Z INFO vector::app: Log level is enabled. level="debug"
2024-03-12T08:06:35.185132Z DEBUG vector::app: messaged="Building runtime." worker_threads=2
2024-03-12T08:06:35.197474Z DEBUG reqwest::connect: starting new connection: http://127.0.0.1:8686/
2024-03-12T08:06:35.197529Z DEBUG hyper::client::connect::http: connecting to 127.0.0.1:8686
2024-03-12T08:06:35.197687Z DEBUG hyper::client::connect::http: connected to 127.0.0.1:8686
2024-03-12T08:06:35.197931Z DEBUG hyper::proto::h1::io: flushed 206 bytes
2024-03-12T08:06:35.198281Z DEBUG hyper::proto::h1::io: parsed 3 headers
2024-03-12T08:06:35.198315Z DEBUG hyper::proto::h1::conn: incoming body is content-length (24 bytes)
2024-03-12T08:06:35.198352Z DEBUG hyper::proto::h1::conn: incoming body completed
2024-03-12T08:06:35.198390Z DEBUG hyper::client::pool: pooling idle connection for ("http", 127.0.0.1:8686)
2024-03-12T08:06:35.200070Z DEBUG tungstenite::handshake::client: Client handshake done.
[tap] Pattern 'vector_internal_logs' successfully matched.
How can I catch those debug events target":"vector::internal_events::http_client","span":{"name":"http"}
via vrl to remove dd-api-key
.
From my observation and test it looks like sink sending them direct to stdout
and I can not modify this behave?
@jszwedko is there any chance that I can somehow:
VECTOR_LOG=debug
RUST_LOG=vector=internal_events::http_client=info
to disable debug log from http_client
?
Ok I found workaround, after set: VECTOR_LOG: 'debug,vector::internal_events::http_client=info'
http_client span stop appeared.
I thought I actually reopened this issue yesterday, but apparently not. Looking back at your original description of the issue, I think I see an issue which is that the headers
field is not actually an object, which is why your VRL doesn't appropriate redact it. Instead it is a JSON-encoded string. I think that could be a bug (I would have expected it to be an object as you did). I'm glad you found a workaround though!
A note for the community
Problem
I'm trying to transform internal logs before forward them to console. I would like to remove every sensitive attribute from vector logs in
debug
mode. Unfortunately someinternal_events
can not be silent or catch and transform via VRL.Configuration
Version
0.34.2
Debug Output
No response
Example Data
Debug logs are correctly transformed and
headers.dd-api-key
was removed:Internal_span still print
headers.dd-api-key
to stdout:Additional Context
Is there any chance to set masking of sensitive data like
dd-api-key
in sink configuration?References
https://github.com/vectordotdev/vector/issues/12399