vectordotdev / vector

A high-performance observability data pipeline.
https://vector.dev
Mozilla Public License 2.0
18.14k stars 1.6k forks source link

Big changes in configuration crash Vector #15824

Open yoelk opened 1 year ago

yoelk commented 1 year ago

A note for the community

Problem

We're using vector with many source-transform-sink blocks (hundreds of them). I've noticed that config changes works well on-the-fly, but if the change is too big, vector might die (vectordev-a-1 exited with code 139). It happens when the number of source-transform-sink blocks changes by 200 or so (each reading from kafka, writing to S3). So say you have 300 such blocks, and you delete 200 of them and let Vector reload the configuration file, it will most probably die.

Configuration

### MISC ###
[sources.vector_metrics]
type = "internal_metrics"

[sinks.prometheus]
type = "prometheus_exporter"
address = "0.0.0.0:8053"
inputs = ["vector_metrics"]
[api]
enabled = true
address = "0.0.0.0:8686"

### BLOCK NO.1 ###
[sources.kafka_account100_consumer0]
type = "kafka"
auto_offset_reset = "smallest"
bootstrap_servers = "*************************************"
group_id = "test1_consumer0"
topics = [ "100" ]
[sources.kafka_account100_consumer0.librdkafka_options]
"security.protocol" = "ssl"

[transforms.parse_events_account100_consumer0]
type = "remap"
inputs = ["kafka_account100_consumer0"]
source = '''
    . = .message
'''

[sinks.write_s3_account100_consumer0]
type = "aws_s3"
inputs = ["parse_events_account100_consumer0"]
bucket = "my_bucket_name"
key_prefix = "test1/account100/"
region = "us-east-1"
[sinks.write_s3_account100_consumer0.batch]
timeout_secs = 10
[sinks.write_s3_account100_consumer0.encoding]
codec = "text"
[sinks.write_s3_account100_consumer0.buffer]
type = "memory"

### BLOCK NO.2 ###
#..........

#.
#.
#. 300 more blocks

Version

0.25.1

Debug Output

No response

Example Data

The processed data is comprised of json strings

Additional Context

No response

References

No response

neuronull commented 1 year ago

👋 hello and thanks for this bug report.

Would you be able to provide either the crash log or otherwise logs from the process to help get to the bottom of this?