vectordotdev / vector

A high-performance observability data pipeline.
https://vector.dev
Mozilla Public License 2.0
17.62k stars 1.55k forks source link

Can not disable default stdout internal_events and use internal_logs source in debug level #20036

Open kaarolch opened 7 months ago

kaarolch commented 7 months ago

A note for the community

Problem

I'm trying to transform internal logs before forward them to console. I would like to remove every sensitive attribute from vector logs in debug mode. Unfortunately some internal_events can not be silent or catch and transform via VRL.

Configuration

sources:
  vector_internal_logs:
    type: internal_logs

transforms:
  internal_debug_logs_filter:
    type: filter
    inputs:
      - vector_internal_logs
    condition: 'downcase(string!(.metadata.level)) == "debug"'
  internal_debug_logs_del_key:
    type: remap
    inputs:
      - internal_debug_logs_filter
    source: |
      if exists(.headers) {
        headers, err = parse_json(.headers)
        if err == null {
          if is_object(headers) && exists(headers."dd-api-key") {
            del(headers."dd-api-key")
            .headers = headers
          }
        }
      }

sinks:
  vector_internal_debug_logs_stdout:
    type: console
    inputs:
      - internal_debug_logs_del_key
    encoding:
      codec: json

Version

0.34.2

Debug Output

No response

Example Data

Debug logs are correctly transformed and headers.dd-api-key was removed:

{"body":"[16 bytes]","headers":"{\"date\": \"Fri, 08 Mar 2024 10:31:46 GMT\", \"content-length\": \"16\", \"content-type\": \"text/plain; charset=utf-8\", \"x-content-type-options\": \"nosniff\", \"strict-transport-security\": \"max-age=31536000; includeSubDomains; preload\"}","host":"vector","message":"HTTP response.","metadata":{"kind":"event","level":"DEBUG","module_path":"vector::internal_events::http_client","target":"vector::internal_events::http_client"},"pid":29,"source_type":"internal_logs","status":"202 Accepted","timestamp":"2024-03-08T10:31:46.325033351Z","vector":{"component_id":"datadog_vector_internal_metrics","component_kind":"sink","component_type":"datadog_metrics"},"version":"HTTP/1.1"}

Internal_span still print headers.dd-api-key to stdout:

{"timestamp":"2024-03-08T10:31:46.257777Z","level":"DEBUG","message":"Sending HTTP request.","uri":"[https://pvtlink.agent.datadoghq.com:443//api/beta/sketches](https://pvtlink.agent.datadoghq.com//api/beta/sketches)","method":"POST","version":"HTTP/1.1","headers":"{\"dd-api-key\": \"*****\", \"dd-agent-payload\": \"4.87.0\", \"content-type\": \"application/x-protobuf\", \"content-encoding\": \"deflate\", \"user-agent\": \"Vector/0.34.2 (aarch64-unknown-linux-gnu d685a16 2024-01-02 14:59:54.890517436)\", \"accept-encoding\": \"identity\"}","body":"[4928 bytes]","target":"vector::internal_events::http_client","span":{"name":"http"},"spans":[{"component_id":"datadog_vector_internal_metrics","component_kind":"sink","component_type":"datadog_metrics","name":"sink"},{"request_id":14,"name":"request"},{"name":"http"}]}

Additional Context

Is there any chance to set masking of sensitive data like dd-api-key in sink configuration?

References

https://github.com/vectordotdev/vector/issues/12399

jszwedko commented 7 months ago

Hi @kaarolch !

I think the issue here is that you want to disable Vector's default logging to stderr while still capturing debug level logs via internal_logs. Is that correct? If so, I would suggest routing Vector's stderr to /dev/null and then just relying on internal_logs to capture the logs so that you can redact them before printing them to the console.

Separately https://github.com/vectordotdev/vector/issues/16609 is tracking creating some more definite development guidelines around Vector's logging to avoid accidentally logging sensitive information.

Let me know if that's clear!

kaarolch commented 7 months ago

@jszwedko unfortunately redirect error logs to /dev/null did not hide DEBUG span.

/bin/bash -c "vector --graceful-shutdown-limit-secs 300 --internal-log-rate-limit 10 --config-dir /vector --log-format json 2>/dev/null"

I still get DEBUG span events with dd-api-key to console:

{"timestamp":"2024-03-11T08:29:24.200943Z","level":"DEBUG","message":"Sending HTTP request.","uri":"https://pvtlink.agent.datadoghq.com:443//api/v1/series","method":"POST","version":"HTTP/1.1","headers":"{\"dd-api-key\": \”******\”, \"dd-agent-payload\": \"4.87.0\", \"content-type\": \"application/json\", \"content-encoding\": \"deflate\", \"user-agent\": \"Vector/0.34.2 (aarch64-unknown-linux-gnu d685a16 2024-01-02 14:59:54.890517436)\", \"accept-encoding\": \"identity\"}","body":"[141074 bytes]","target":"vector::internal_events::http_client","span":{"name":"http"},"spans":[{"component_id":"datadog_agent_dogstatsd_metrics","component_kind":"sink","component_type":"datadog_metrics","name":"sink"},{"request_id":426,"name":"request"},{"name":"http"}]}

Also I would prefer to not redirect vector stdout/stderr because if vector failed or report problem during start we will miss related events.

Separately https://github.com/vectordotdev/vector/issues/16609 is tracking creating some more definite development guidelines around Vector's logging to avoid accidentally logging sensitive information.

TBH there should be option to enable debug logs and filter sensitive keys from it. Would be nice if internal_logs source could be set in debug but other vector parts wont throw debug events direct to stdout. This will help to create custom vrl filters.

jszwedko commented 7 months ago

That debug span there appears to be output from the console sink which indicates an issue with your configuration in that it isn't dropping the dd-api-key field.

Also I would prefer to not redirect vector stdout/stderr because if vector failed or report problem during start we will miss related events.

True, that makes sense. Unfortunately I'm not seeing another good method just using Vector currently. You could consider feeding the output through another process to do redaction.

Separately https://github.com/vectordotdev/vector/issues/16609 is tracking creating some more definite development guidelines around Vector's logging to avoid accidentally logging sensitive information.

TBH there should be option to enable debug logs and filter sensitive keys from it. Would be nice if internal_logs source could be set in debug but other vector parts wont throw debug events direct to stdout. This will help to create custom vrl filters.

Agreed, that's actually being tracked by https://github.com/vectordotdev/vector/issues/12399

I think the request here is covered by https://github.com/vectordotdev/vector/issues/12399 so I'll close this one in-lieu of that one but let me know if you disagree! Also feel free to leave additional thoughts on that other issue.

kaarolch commented 7 months ago

@jszwedko

That debug span there appears to be output from the console sink which indicates an issue with your configuration in that it isn't dropping the dd-api-key field.

I did some tests:

I've also tried to tap to output of internal_logs source and find any events without metadata:

vector$ vector tap --outputs-of vector_internal_logs | grep -v metadata
2024-03-12T08:06:35.184941Z DEBUG vector::app: Internal log rate limit configured. internal_log_rate_secs=10
2024-03-12T08:06:35.184981Z  INFO vector::app: Log level is enabled. level="debug"
2024-03-12T08:06:35.185132Z DEBUG vector::app: messaged="Building runtime." worker_threads=2
2024-03-12T08:06:35.197474Z DEBUG reqwest::connect: starting new connection: http://127.0.0.1:8686/
2024-03-12T08:06:35.197529Z DEBUG hyper::client::connect::http: connecting to 127.0.0.1:8686
2024-03-12T08:06:35.197687Z DEBUG hyper::client::connect::http: connected to 127.0.0.1:8686
2024-03-12T08:06:35.197931Z DEBUG hyper::proto::h1::io: flushed 206 bytes
2024-03-12T08:06:35.198281Z DEBUG hyper::proto::h1::io: parsed 3 headers
2024-03-12T08:06:35.198315Z DEBUG hyper::proto::h1::conn: incoming body is content-length (24 bytes)
2024-03-12T08:06:35.198352Z DEBUG hyper::proto::h1::conn: incoming body completed
2024-03-12T08:06:35.198390Z DEBUG hyper::client::pool: pooling idle connection for ("http", 127.0.0.1:8686)
2024-03-12T08:06:35.200070Z DEBUG tungstenite::handshake::client: Client handshake done.
[tap] Pattern 'vector_internal_logs' successfully matched.

How can I catch those debug events target":"vector::internal_events::http_client","span":{"name":"http"} via vrl to remove dd-api-key.

From my observation and test it looks like sink sending them direct to stdout and I can not modify this behave?

kaarolch commented 7 months ago

@jszwedko is there any chance that I can somehow:

VECTOR_LOG=debug
RUST_LOG=vector=internal_events::http_client=info

to disable debug log from http_client?

kaarolch commented 7 months ago

Ok I found workaround, after set: VECTOR_LOG: 'debug,vector::internal_events::http_client=info' http_client span stop appeared.

jszwedko commented 7 months ago

I thought I actually reopened this issue yesterday, but apparently not. Looking back at your original description of the issue, I think I see an issue which is that the headers field is not actually an object, which is why your VRL doesn't appropriate redact it. Instead it is a JSON-encoded string. I think that could be a bug (I would have expected it to be an object as you did). I'm glad you found a workaround though!