vectordotdev / vector

A high-performance observability data pipeline.
https://vector.dev
Mozilla Public License 2.0
17.47k stars 1.53k forks source link

`secret` stuck `vector` when reaching X number of configurations #21044

Open danelkotev opened 1 month ago

danelkotev commented 1 month ago

Problem

We use vector.dev in production and for Azure Blog Storage sink we use secret of type exec that runs some script. We create a configuration file per customer, but the configuration is very similar to all: Kafka source, filter on Kafka header, Azure sink. When reaching 41 config files vector gets stuck, CPU reaches more than 100%.

I tried many alternatives:

  1. instead of calling the script - write the command in the config file
  2. provide a simple echo cmd
  3. upgrading to latest version

Nothing changes the behavior, hence I believe there is a bug that will also apply in AWS Secret option since as I see it there is an issue with stdin blocking.

Sharing that I debugged it and try to solve the issue, seams like the issue is with query_backend where after some secret loading, the stdout doesn't switch to None, the only thing that fixed it - is something like (see the break - I don't think it's a good solution thought, but it solved the issue):

                match stdout {
                    None => break,
                    Some(Ok(b)) => {
                        output.extend(b);
                        break;
                    }
                    Some(Err(e)) => return Err(format!("Error while reading from an exec backend stdout: {}.", e).into()),
                }
            }

Step to reproduce:

  1. have this simple json config 60 times (with different ids to the steps), have kafka docker and create one topic:
    
    "sources": {
    "dbc9eb9372fdaeb4c51891cd_source": {
      "type": "kafka",
      "bootstrap_servers": "kafka:9092",
      "auto_offset_reset": "earliest",
      "group_id": "dbc9eb9372fdaeb4c51891cd",
      "topics": [
        "x"
      ],
      "librdkafka_options": {
        "compression.codec": "gzip"
      }
    }
    },
    "sinks": {
    "dbc9eb9372fdaeb4c51891cd_sink": {
      "inputs": [
        "dbc9eb9372fdaeb4c51891cd_source"
      ],
      "type": "azure_blob",
      "container_name": "CONTAINER_NAME",
      "blob_prefix": "FOLDER_PATH",
      "encoding": {
        "codec": "text"
      },
      "batch": {
        "timeout_secs": 10
      },
      "acknowledgements": {
        "enabled": true
      },
      "connection_string": "SECRET[dbc9eb9372fdaeb4c51891cd_secret.azure]"
    }
    },
    "secret": {
    "dbc9eb9372fdaeb4c51891cd_secret": {
      "type": "exec",
      "command": [
        "scripts/DecryptSecret.sh"
      ]
    }
    }
    }'''
  2. run vector.dev locally referring to the folder
  3. vector is stuck after ±40 configs

Version

0.28.1, 0.40.0

jszwedko commented 1 month ago

I was able to reproduce this. Curiously it seems like it worked up to 42 secret backends, but when I did 43 is when it hung. Reproduction steps:

Use this jsonnet config:

local arr = std.range(1, std.extVar("n"));

{
  "secret": {
    [std.format("secret_%03d", i)]: {
      "type": "exec",
        "command": [
          "./secret.sh"
        ]
    }
    for i in arr
  },
  "sources": {
    [std.format("source_%03d", i)]: {
      "type": "demo_logs",
      "format": std.format("SECRET[secret_%03d.format]", i),
      "interval": 1.0
    }
    for i in arr
  },
  "sinks": {
    [std.format("sink_%03d", i)]: {
      "type": "blackhole",
      "inputs": [
        std.format("source_%03d", i)
      ],
      "print_interval_secs": 1,
    }
    for i in arr
  }
}

In secret.sh put:

echo '{ "format": {"value": "json", "error": null} }'

Generate a config with 43 secret backends: jsonnet vector.jsonnet --ext-code n=43 | tee vector.json

Run vector --config vector.json.

Observe that Vector hangs on start-up. If you try 42 it loads correctly.