grafana / alloy

OpenTelemetry Collector distribution with programmable pipelines
https://grafana.com/oss/alloy
Apache License 2.0
977 stars 103 forks source link

kafka-exporter bug with tls ca #280

Open max107 opened 5 months ago

max107 commented 5 months ago

What's wrong?

impossible to use tls with ca only (without cert&key pairs)

Steps to reproduce

working docker-compose example

version: '3.9'
services:
  agent: # dead
    image: docker.io/grafana/agent:main-6e4a9b9
    environment:
      AGENT_MODE: flow
      CONFIG_FILE_PATH: /config.river
    command:
      - run
      - /config.river
      - --storage.path=/tmp/agent
      - --server.http.listen-addr=0.0.0.0:80
      - --server.http.ui-path-prefix=/
      - --disable-reporting
      - --cluster.enabled=false
    ports:
      - "8080:80"
    volumes:
      - ./example.river:/config.river:ro
      - ./ca.pem:/pki/ca.pem:ro
  exporter: # alive
    image: dparrott/kafka-exporter:latest
    command:
      - --sasl.enabled
      - --sasl.handshake
      - --sasl.username=example
      - --sasl.password=example
      - --sasl.mechanism=plain
      - --tls.enabled
      - --tls.ca-file=/pki/ca.pem
      - --kafka.server=example-broker.com
    ports:
      - "9308:9308"
    volumes:
      - ./ca.pem:/pki/ca.pem:ro

System information

No response

Software version

docker.io/grafana/agent:main-6e4a9b9 (ttps://github.com/grafana/agent/issues/6044)

Configuration

logging {
    level  = "debug"
    format = "logfmt"
}

prometheus.exporter.kafka "example" {
    kafka_uris = ["example-broker.com"]
    ca_file = "/pki/ca.pem"
    use_tls = true
    instance = "example"
    use_sasl = true
    use_sasl_handshake = true
    sasl_mechanism = "plain"
    sasl_username = "example"
    sasl_password = "example"
}

prometheus.scrape "example" {
    targets    = prometheus.exporter.kafka.example.targets
    forward_to = [prometheus.remote_write.default.receiver]
}

### Logs

```text
↪ docker compose up agent
[+] Running 1/0
 ✔ Container ha-agent-1  Created                                                                                                                                       0.0s
Attaching to agent-1
agent-1  | ts=2024-01-04T15:41:51.71027843Z level=info "boringcrypto enabled"=false
agent-1  | ts=2024-01-04T15:41:51.711410456Z level=info msg="starting complete graph evaluation" controller_id="" trace_id=25843d6229f77cc08cf3d378255624bc
agent-1  | ts=2024-01-04T15:41:51.71150907Z level=error msg="failed to evaluate config" controller_id="" trace_id=25843d6229f77cc08cf3d378255624bc node=prometheus.exporter.kafka.example err="building component: tls is enabled but key pair was not provided"
agent-1  | ts=2024-01-04T15:41:51.711523841Z level=info msg="finished node evaluation" controller_id="" trace_id=25843d6229f77cc08cf3d378255624bc node_id=prometheus.exporter.kafka.example duration=73.034µs
agent-1  | ts=2024-01-04T15:41:51.711538304Z level=info msg="finished node evaluation" controller_id="" trace_id=25843d6229f77cc08cf3d378255624bc node_id=labelstore duration=5.547µs
agent-1  | ts=2024-01-04T15:41:51.71154596Z level=info msg="finished node evaluation" controller_id="" trace_id=25843d6229f77cc08cf3d378255624bc node_id=otel duration=1.802µs
agent-1  | ts=2024-01-04T15:41:51.713949464Z level=info msg="replaying WAL, this may take a while" component=prometheus.remote_write.default subcomponent=wal dir=/tmp/agent/prometheus.remote_write.default/wal
agent-1  | ts=2024-01-04T15:41:51.714095538Z level=info msg="WAL segment loaded" component=prometheus.remote_write.default subcomponent=wal segment=0 maxSegment=10
agent-1  | ts=2024-01-04T15:41:51.714146276Z level=info msg="WAL segment loaded" component=prometheus.remote_write.default subcomponent=wal segment=1 maxSegment=10
agent-1  | ts=2024-01-04T15:41:51.714214294Z level=info msg="WAL segment loaded" component=prometheus.remote_write.default subcomponent=wal segment=2 maxSegment=10
agent-1  | ts=2024-01-04T15:41:51.714328776Z level=info msg="WAL segment loaded" component=prometheus.remote_write.default subcomponent=wal segment=3 maxSegment=10
agent-1  | ts=2024-01-04T15:41:51.714437978Z level=info msg="WAL segment loaded" component=prometheus.remote_write.default subcomponent=wal segment=4 maxSegment=10
agent-1  | ts=2024-01-04T15:41:51.714574961Z level=info msg="WAL segment loaded" component=prometheus.remote_write.default subcomponent=wal segment=5 maxSegment=10
agent-1  | ts=2024-01-04T15:41:51.714746304Z level=info msg="WAL segment loaded" component=prometheus.remote_write.default subcomponent=wal segment=6 maxSegment=10
agent-1  | ts=2024-01-04T15:41:51.714822261Z level=info msg="WAL segment loaded" component=prometheus.remote_write.default subcomponent=wal segment=7 maxSegment=10
agent-1  | ts=2024-01-04T15:41:51.715059696Z level=info msg="WAL segment loaded" component=prometheus.remote_write.default subcomponent=wal segment=8 maxSegment=10
agent-1  | ts=2024-01-04T15:41:51.71523854Z level=info msg="WAL segment loaded" component=prometheus.remote_write.default subcomponent=wal segment=9 maxSegment=10
agent-1  | ts=2024-01-04T15:41:51.715596032Z level=info msg="WAL segment loaded" component=prometheus.remote_write.default subcomponent=wal segment=10 maxSegment=10
agent-1  | ts=2024-01-04T15:41:51.716605305Z level=info msg="Starting WAL watcher" component=prometheus.remote_write.default subcomponent=rw remote_name=cc654b url=http://<edit>:80/api/v1/push queue=cc654b
agent-1  | ts=2024-01-04T15:41:51.716663546Z level=info msg="Starting scraped metadata watcher" component=prometheus.remote_write.default subcomponent=rw remote_name=cc654b url=http://<edit>:80/api/v1/push
agent-1  | ts=2024-01-04T15:41:51.716703315Z level=info msg="finished node evaluation" controller_id="" trace_id=25843d6229f77cc08cf3d378255624bc node_id=prometheus.remote_write.default duration=5.150952ms
agent-1  | ts=2024-01-04T15:41:51.717002821Z level=info msg="Replaying WAL" component=prometheus.remote_write.default subcomponent=rw remote_name=cc654b url=http://<edit>:80/api/v1/push queue=cc654b
agent-1  | ts=2024-01-04T15:41:51.718305054Z level=info msg="finished node evaluation" controller_id="" trace_id=25843d6229f77cc08cf3d378255624bc node_id=prometheus.scrape.example duration=1.586333ms
agent-1  | ts=2024-01-04T15:41:51.718384609Z level=info msg="finished node evaluation" controller_id="" trace_id=25843d6229f77cc08cf3d378255624bc node_id=tracing duration=15.347µs
agent-1  | ts=2024-01-04T15:41:51.718435121Z level=info msg="finished node evaluation" controller_id="" trace_id=25843d6229f77cc08cf3d378255624bc node_id=logging duration=35.74µs
agent-1  | ts=2024-01-04T15:41:51.718454044Z level=info msg="applying non-TLS config to HTTP server" service=http
agent-1  | ts=2024-01-04T15:41:51.718458876Z level=info msg="finished node evaluation" controller_id="" trace_id=25843d6229f77cc08cf3d378255624bc node_id=http duration=15.661µs
agent-1  | ts=2024-01-04T15:41:51.718472947Z level=info msg="finished node evaluation" controller_id="" trace_id=25843d6229f77cc08cf3d378255624bc node_id=cluster duration=6.338µs
agent-1  | ts=2024-01-04T15:41:51.718485843Z level=info msg="finished node evaluation" controller_id="" trace_id=25843d6229f77cc08cf3d378255624bc node_id=ui duration=6.619µs
agent-1  | ts=2024-01-04T15:41:51.718506077Z level=info msg="finished complete graph evaluation" controller_id="" trace_id=25843d6229f77cc08cf3d378255624bc duration=7.145143ms
agent-1  | Error: /config.river:6:1: Failed to build component: building component: tls is enabled but key pair was not provided
agent-1  |
agent-1  |  5 |
agent-1  |  6 |   prometheus.exporter.kafka "example" {
agent-1  |    |  _^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
max107 commented 5 months ago

@rfratto latest docker image main-6e4a9b9 from https://github.com/grafana/agent/issues/6044 issue

rfratto commented 5 months ago

@max107 I'm not sure this is a bug; you need to configure both cert_file and key_file when use_tls is enabled, but I don't see either set on your config.

Our documentation doesn't mention this, so I'm going to relabel this as a docs issue.

max107 commented 5 months ago

@rfratto we don't have cert_file & key_file, we have only ca (root cert). Please see example in docker-compose above, original kafka exporter work without any problem with ca only.

max107 commented 5 months ago

Our documentation doesn't mention this, so I'm going to relabel this as a docs issue.

documentation is okay, problem with configuration validator. Use ca only is absolute legit )

max107 commented 5 months ago

problem is here https://github.com/grafana/agent/blob/cce5b03b141d8bf43ca4ab473c8f07d6d9136b3d/pkg/integrations/kafka_exporter/kafka_exporter.go#L134-L136, where ca field validation?

Eve832 commented 5 months ago

@rfratto can we confirm if this issue really is a docs issue or a code issue?

rfratto commented 5 months ago

@Eve832 The documentation doesn't currently reflect the requirements in the code. There's a discussion to be had around changing those requirements, but I personally think the documentation should still be updated to reflect the state of the code today.

max107 commented 5 months ago

lets see original TLS check here

There's a discussion to be had around changing those requirements

in my opinion first step - remove custom conditions in grafana-agent exporter source code, because original exporter do the same things.

7840vz commented 5 months ago

Had same issue - required to provide CA without any keys, which is completely valid configuration.

Had to apply same patch as suggested in grafana/agent#6049 to bypass it.

I think suggested PR should be merged.

Eve832 commented 5 months ago

assigning @clayton-cornell so this doesn't get lost in the shuffle

senax commented 4 months ago

We still run ‘real’ Kafka exporters because of this bug. Kafka uses TLS without client certs, just for session encryption. Would love to only have grafana-agent running to monitor.