vectordotdev / vector

A high-performance observability data pipeline.
https://vector.dev
Mozilla Public License 2.0
18.23k stars 1.6k forks source link

Vector panics with "index out of bounds: the len is 0 but the index is 0" when config reloads #13229

Closed wgb1990 closed 2 years ago

wgb1990 commented 2 years ago

A note for the community

Problem

When the vector configuration file is reloaded, the enrichment_tables file is also changed. After that, the vector may appear panic,the panic log is as follows:

2022-06-19T15:35:30.303506Z INFO vector::topology::running: New configuration loaded successfully. 2022-06-19T15:35:30.304373Z INFO vector: Vector has reloaded. path=[Dir("/etc/vector")] 2022-06-19T15:35:30.325679Z INFO vector::topology::builder: Healthcheck: Passed. thread 'vector-worker' panicked at 'index out of bounds: the len is 0 but the index is 0', src/enrichment_tables/file.rs:366:12 note: run with RUST_BACKTRACE=1 environment variable to display a backtrace thread 'vector-worker' panicked at 'internal error: entered unreachable code: join error or bad poll', src/topology/builder.rs:669:30 2022-06-19T15:35:35.846522Z ERROR transform{component_kind="transform" component_id=remap_message component_type=remap component_name=remap_message}: vector::topology: An error occurred that Vector couldn't handle.

Configuration

api:
  enabled: true
  address: '0.0.0.0:8686'

sources:
  metrics:
    type: internal_metrics
    scrape_interval_secs: 2
  biz_log:
    type: kafka
    bootstrap_servers: localhost:9200 
    group_id: biz-log-gid
    topics:
      - biz-log

enrichment_tables:
  service_topic_mapping:
    type: file
    file:
      path: /etc/vector/router.csv
      encoding:
        type: csv
    schema:
      service_name: string
      kafka_topic: string

transforms:
  remap_message:
    type: remap
    inputs:
      - biz_log
    source: |
     log = parse_json!(.message)
     . = parse_regex!(log.Contents[0].Value, r'^/logs/.*/volumes/.*/.*/(?P<service>.*)/(?P<env>.*)/(?P<ip>.*)/log-dir/bizLog/(?P<filename>.*)$')
     tags,err = split(log.Contents[1].Value,"|", 6)
     if length(tags) >= 5{
        .metricKey = tags[4]
     }
     .message = log.Contents[1].Value
     row = get_enrichment_table_record("service_topic_mapping", { "service_name" : .service }) ?? .service
     .kafkaTopic = row.kafka_topic
     if is_null(.kafkaTopic) {
        .kafkaTopic = "default_topic"
     }

sinks:
  kafak_sink:
    type: kafka
    inputs:
      - remap_message
    bootstrap_servers: localhost:9200
    key_field: user_id
    topic: "{{ kafkaTopic }}"
    compression: lz4
    encoding:
      codec: json
    healthcheck: 
      enabled: true

  loki_sink:
    type: loki
    inputs:
      - remap_message
    endpoint: 'http://localhost:3100'
    remove_label_fields: true
    tenant_id: "biz_monitor_log"
    out_of_order_action: "accept"
    labels:
      service: "{{ service }}"
      ip: "{{ ip }}"
      env: "{{ env }}"
      filename: "{{ filename }}"
      metric:   "{{ metricKey }}"
    compression: gzip
    encoding:
      codec: json
      timestamp_format: rfc3339
    healthcheck:
      enabled: true 

  prometheus_exporter:
    type: prometheus_exporter
    inputs:
      - metrics
    address: '0.0.0.0:9598'
    default_namespace: service

Version

0.22.1-alpine

Debug Output

No response

Example Data

No response

Additional Context

No response

References

No response

mr-karan commented 2 years ago

+1 I was able to replicate this as well.

2022-06-20T15:28:10.001782Z  INFO vector::topology::running: Running healthchecks.
2022-06-20T15:28:10.001807Z  INFO vector::topology::running: All healthchecks passed.
2022-06-20T15:28:10.001832Z  INFO vector: Vector has reloaded. path=[File("/opt/nomad/data/alloc/9146bbb7-3094-61b4-4c26-b8524442d39d/vector/local/vector.toml", None)]
thread 'vector-worker' panicked at 'index out of bounds: the len is 0 but the index is 0', src/enrichment_tables/file.rs:366:12
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
thread 'vector-worker' panicked at 'internal error: entered unreachable code: join error or bad poll', src/topology/builder.rs:617:30
2022-06-20T15:28:10.143089Z ERROR transform{component_kind="transform" component_id=enrich_nomad_logs component_type=remap component_name=enrich_nomad_logs}: vector::topology: An error occurred that vector couldn't handle.
2022-06-20T15:28:10.143152Z  INFO vector: Vector has stopped.

I am sending a SIGHUP whenever the enrichment file (CSV source) changes.

StephenWakely commented 2 years ago

@wgb1990 @mr-karan

I'm having difficulty replicating this issue. Do you have more details about how the CSV file changes? What was the structure before and after the reload?

wgb1990 commented 2 years ago

@StephenWakely The contents of the CSV file are as follows

service_name,kafka_topic
service1,topic1
service2,topic2
service3,topic3

adding new records or modifying existing records,sending a SIGHUP may trigger panic.

mr-karan commented 2 years ago

Contents of my CSV

Before sending a reload:

alloc_id,namespace,job,group,task,node
350ce486-27f2-51d2-2f6c-b6ce6e6dbd89,default,vector,vector,vector,pop-os
350ce486-27f2-51d2-2f6c-b6ce6e6dbd89,default,vector,vector,vector_reloader,pop-os
350ce486-27f2-51d2-2f6c-b6ce6e6dbd89,default,vector,vector,events_csv_generator,pop-os
5014a7ca-87b1-f40a-cff1-c479f0c77f2f,default,nginx,nginx,proxy,pop-os

After sending a reload (2nd and 3rd line order is different, that's all)

alloc_id,namespace,job,group,task,node
350ce486-27f2-51d2-2f6c-b6ce6e6dbd89,default,vector,vector,vector,pop-os
350ce486-27f2-51d2-2f6c-b6ce6e6dbd89,default,vector,vector,events_csv_generator,pop-os
350ce486-27f2-51d2-2f6c-b6ce6e6dbd89,default,vector,vector,vector_reloader,pop-os
5014a7ca-87b1-f40a-cff1-c479f0c77f2f,default,nginx,nginx,proxy,pop-os