grepplabs / kafka-proxy

Proxy connections to Kafka cluster. Connect through SOCKS Proxy, HTTP Proxy or to cluster running in Kubernetes.
Apache License 2.0
501 stars 86 forks source link

Can't get proxy-listener tls to work #170

Open genebean opened 5 months ago

genebean commented 5 months ago

I'm trying to use the same cert & key for both the listener side and the sending side of kafka-proxy and in telegraf. These are the same certs you made work in #168. Should the setup below work, or am I missing something?

telegraf with tls enabled --> proxy with proxy-listener-tls-enable & tls-enable --> kafka

I feel like I am missing something.

everesio commented 5 months ago

Please provide:

genebean commented 5 months ago

Certs

The cert, key, and ca file are generated by someone else for me off an internal ca. If there is anything I can extract from it that would be helpful. Here is part of it for reference as shown by openssl x509 -text -noout -in ~/.ssh/user-cert.pem (some parts redacted to generic info instead of what is really shown, but no fields have been added or removed):

Certificate:
    Data:
        Version: 3 (0x2)
        Serial Number: 15336 (0x3be8)
        Signature Algorithm: sha512WithRSAEncryption
        Issuer: C=US, ST=SomeState, O=Company Name, CN=Company Name Intermediate CA
        Validity
            Not Before: Aug 16 13:56:12 2023 GMT
            Not After : Aug 11 13:56:12 2043 GMT
        Subject: C=US, ST=SomeState, O=Company Name, CN=Gene Liverman, CN=USER:gene.liverman, emailAddress=gene.liverman@example.com
        Subject Public Key Info:
            Public Key Algorithm: rsaEncryption
                Public-Key: (2048 bit)

And here is the same info from the ca cert:

Certificate:
    Data:
        Version: 3 (0x2)
        Serial Number: 4096 (0x1000)
        Signature Algorithm: sha512WithRSAEncryption
        Issuer: C=US, ST=SomeState, L=Foo, CN=Company Name Root CA, O=Company Name, emailAddress=sys.ca@example.com
        Validity
            Not Before: Aug 12 02:48:25 2015 GMT
            Not After : Aug  7 02:48:25 2035 GMT
        Subject: C=US, ST=SomeState, O=Company Name, CN=Company Name Intermediate CA
        Subject Public Key Info:
            Public Key Algorithm: rsaEncryption
                Public-Key: (2048 bit)

Not working parts

Errors on kafka-proxy output:

INFO[2024-06-05T11:32:55-04:00] Dial address changed from 127.0.0.1:19003 to was-dc2-kafka-1.example.net:9091
INFO[2024-06-05T11:32:55-04:00] couldn't connect to was-dc2-kafka-1.example.net:9091(127.0.0.1:19003): dial tcp x.x.x.x:9091: connect: connection refused
INFO[2024-06-05T11:32:55-04:00] New connection for 127.0.0.1:19013
INFO[2024-06-05T11:32:55-04:00] Dial address changed from 127.0.0.1:19013 to chi-kafka-2.example.net:9091
INFO[2024-06-05T11:32:55-04:00] Reading data from local connection on 127.0.0.1:30013 from 127.0.0.1:61810 (127.0.0.1:19013) had error: remote error: tls: bad certificate
INFO[2024-06-05T11:32:55-04:00] New connection for 127.0.0.1:19004
INFO[2024-06-05T11:32:55-04:00] Dial address changed from 127.0.0.1:19004 to nyc-kafka-1.example.net:9091
INFO[2024-06-05T11:32:55-04:00] Reading data from local connection on 127.0.0.1:30004 from 127.0.0.1:61812 (127.0.0.1:19004) had error: remote error: tls: bad certificate
INFO[2024-06-05T11:32:55-04:00] New connection for 127.0.0.1:19009
INFO[2024-06-05T11:32:55-04:00] Dial address changed from 127.0.0.1:19009 to dfw-kafka-1.example.net:9091
INFO[2024-06-05T11:32:55-04:00] couldn't connect to dfw-kafka-1.example.net:9091(127.0.0.1:19009): dial tcp x.x.x.x:9091: connect: connection refused

Errors on telegraf output:

2024-06-05T15:32:56Z D! [sarama] Connected to broker at 127.0.0.1:30008 (unregistered)
2024-06-05T15:32:56Z D! [sarama] Error while sending ApiVersionsRequest to broker 127.0.0.1:30008: tls: failed to verify certificate: x509: cannot validate certificate for 127.0.0.1 because it doesn't contain any IP SANs
2024-06-05T15:32:56Z D! [sarama] client/metadata got error from broker -1 while fetching metadata: tls: failed to verify certificate: x509: cannot validate certificate for 127.0.0.1 because it doesn't contain any IP SANs
2024-06-05T15:32:56Z D! [sarama] Closed connection to broker 127.0.0.1:30008
2024-06-05T15:32:56Z D! [sarama] client/metadata fetching metadata for all topics from broker 127.0.0.1:30003
2024-06-05T15:32:56Z D! [sarama] Connected to broker at 127.0.0.1:30003 (unregistered)
2024-06-05T15:32:56Z D! [sarama] Error while sending ApiVersionsRequest to broker 127.0.0.1:30003: read tcp 127.0.0.1:61835->127.0.0.1:30003: read: connection reset by peer
2024-06-05T15:32:56Z D! [sarama] client/metadata got error from broker -1 while fetching metadata: read tcp 127.0.0.1:61835->127.0.0.1:30003: read: connection reset by peer
2024-06-05T15:32:56Z D! [sarama] Closed connection to broker 127.0.0.1:30003
2024-06-05T15:32:56Z D! [sarama] client/metadata fetching metadata for all topics from broker 127.0.0.1:30013
2024-06-05T15:32:56Z D! [sarama] Connected to broker at 127.0.0.1:30013 (unregistered)
2024-06-05T15:32:56Z D! [sarama] Error while sending ApiVersionsRequest to broker 127.0.0.1:30013: tls: failed to verify certificate: x509: cannot validate certificate for 127.0.0.1 because it doesn't contain any IP SANs
2024-06-05T15:32:56Z D! [sarama] client/metadata got error from broker -1 while fetching metadata: tls: failed to verify certificate: x509: cannot validate certificate for 127.0.0.1 because it doesn't contain any IP SANs
2024-06-05T15:32:56Z D! [sarama] Closed connection to broker 127.0.0.1:30013
2024-06-05T15:32:56Z D! [sarama] client/metadata no available broker to send metadata request to
2024-06-05T15:32:56Z D! [sarama] client/brokers resurrecting 13 dead seed brokers
2024-06-05T15:32:56Z D! [sarama] Closing Client
2024-06-05T15:32:56Z E! [agent] Failed to connect to [outputs.kafka], retrying in 15s, error was "kafka: client has run out of available brokers to talk to: 13 errors occurred:\n\t* tls: failed to verify certificate: x509: cannot validate certificate for 127.0.0.1 because it doesn't contain any IP SANs\n\t* read tcp 127.0.0.1:61814->127.0.0.1:30009: read: connection reset by peer\n\t* tls: failed to verify certificate: x509: cannot validate certificate for 127.0.0.1 because it doesn't contain any IP SANs\n\t* tls: failed to verify certificate: x509: cannot validate certificate for 127.0.0.1 because it doesn't contain any IP SANs\n\t* read tcp 127.0.0.1:61820->127.0.0.1:30006: read: connection reset by peer\n\t* tls: failed to verify certificate: x509: cannot validate certificate for 127.0.0.1 because it doesn't contain any IP SANs\n\t* tls: failed to verify certificate: x509: cannot validate certificate for 127.0.0.1 because it doesn't contain any IP SANs\n\t* tls: failed to verify certificate: x509: cannot validate certificate for 127.0.0.1 because it doesn't contain any IP SANs\n\t* tls: failed to verify certificate: x509: cannot validate certificate for 127.0.0.1 because it doesn't contain any IP SANs\n\t* tls: failed to verify certificate: x509: cannot validate certificate for 127.0.0.1 because it doesn't contain any IP SANs\n\t* tls: failed to verify certificate: x509: cannot validate certificate for 127.0.0.1 because it doesn't contain any IP SANs\n\t* read tcp 127.0.0.1:61835->127.0.0.1:30003: read: connection reset by peer\n\t* tls: failed to verify certificate: x509: cannot validate certificate for 127.0.0.1 because it doesn't contain any IP SANs\n"

Proxy config when using tls for proxy client:

./kafka-proxy server \
  --bootstrap-server-mapping "127.0.0.1:19001,0.0.0.0:30001" \
  --bootstrap-server-mapping "127.0.0.1:19002,0.0.0.0:30002" \
  --bootstrap-server-mapping "127.0.0.1:19003,0.0.0.0:30003" \
  --bootstrap-server-mapping "127.0.0.1:19004,0.0.0.0:30004" \
  --bootstrap-server-mapping "127.0.0.1:19005,0.0.0.0:30005" \
  --bootstrap-server-mapping "127.0.0.1:19006,0.0.0.0:30006" \
  --bootstrap-server-mapping "127.0.0.1:19007,0.0.0.0:30007" \
  --bootstrap-server-mapping "127.0.0.1:19008,0.0.0.0:30008" \
  --bootstrap-server-mapping "127.0.0.1:19009,0.0.0.0:30009" \
  --bootstrap-server-mapping "127.0.0.1:19010,0.0.0.0:30010" \
  --bootstrap-server-mapping "127.0.0.1:19011,0.0.0.0:30011" \
  --bootstrap-server-mapping "127.0.0.1:19012,0.0.0.0:30012" \
  --bootstrap-server-mapping "127.0.0.1:19013,0.0.0.0:30013" \
  --dial-address-mapping "127.0.0.1:19001,was-dc2-kafka-1.example.net:9091" \
  --dial-address-mapping "127.0.0.1:19002,was-dc2-kafka-2.example.net:9091" \
  --dial-address-mapping "127.0.0.1:19003,was-dc2-kafka-1.example.net:9091" \
  --dial-address-mapping "127.0.0.1:19004,nyc-kafka-1.example.net:9091" \
  --dial-address-mapping "127.0.0.1:19005,nyc-kafka-2.example.net:9091" \
  --dial-address-mapping "127.0.0.1:19006,nyc-kafka-1.example.net:9091" \
  --dial-address-mapping "127.0.0.1:19007,dfw-kafka-1.example.net:9091" \
  --dial-address-mapping "127.0.0.1:19008,dfw-kafka-2.example.net:9091" \
  --dial-address-mapping "127.0.0.1:19009,dfw-kafka-1.example.net:9091" \
  --dial-address-mapping "127.0.0.1:19010,atl-at2-kafka-1.example.net:9091" \
  --dial-address-mapping "127.0.0.1:19011,atl-at2-kafka-2.example.net:9091" \
  --dial-address-mapping "127.0.0.1:19012,chi-kafka-1.example.net:9091" \
  --dial-address-mapping "127.0.0.1:19013,chi-kafka-2.example.net:9091" \
  --proxy-listener-tls-enable \
  --proxy-listener-ca-chain-cert-file /Users/gene.liverman/.ssh/ca-certs.pem \
  --proxy-listener-cert-file /Users/gene.liverman/.ssh/user-cert.pem \
  --proxy-listener-key-file /Users/gene.liverman/.ssh/user-key.pem \
  --proxy-listener-key-password $FLOW_CERT_PW \
  --tls-enable \
  --tls-ca-chain-cert-file /Users/gene.liverman/.ssh/ca-certs.pem \
  --tls-client-cert-file /Users/gene.liverman/.ssh/user-cert.pem \
  --tls-client-key-file /Users/gene.liverman/.ssh/user-key.pem \
  --tls-client-key-password $FLOW_CERT_PW \
  --debug-enable

telegraf config:

[global_tags]
[agent]
  interval = "10s"
  round_interval = true
  metric_batch_size = 1000
  metric_buffer_limit = 10000
  collection_jitter = "0s"
  flush_interval = "10s"
  flush_jitter = "0s"
  precision = "0s"
  debug = true
  logtarget = "stderr"
  hostname = ""
  omit_hostname = false
[[outputs.kafka]]
  brokers = [
    "127.0.0.1:30001",
    "127.0.0.1:30002",
    "127.0.0.1:30003",
    "127.0.0.1:30004",
    "127.0.0.1:30005",
    "127.0.0.1:30006",
    "127.0.0.1:30007",
    "127.0.0.1:30008",
    "127.0.0.1:30009",
    "127.0.0.1:30010",
    "127.0.0.1:30011",
    "127.0.0.1:30012",
    "127.0.0.1:30013"
  ]
  topic = "private.test.gene.metrics"
  version = "3.7.0"
  routing_tag = "host"
  compression_codec = 2
  insecure_skip_verify = false
  tls_ca = "/Users/gene.liverman/.ssh/ca-certs.pem"
  tls_cert = "/Users/gene.liverman/.ssh/user-cert.pem"
  tls_key = "/Users/gene.liverman/.ssh/user-key.pem"
  tls_key_pwd = "uncross-dimity-smutch-dual"
[[inputs.prometheus]]
  urls = ["http://127.0.0.1:9100/metrics"]

Working parts

The one that works and does not throw any errors is when I don't do tls from the proxy client. That proxy config is below:


./kafka-proxy server \
  --bootstrap-server-mapping "127.0.0.1:19001,0.0.0.0:30001" \
  --bootstrap-server-mapping "127.0.0.1:19002,0.0.0.0:30002" \
  --bootstrap-server-mapping "127.0.0.1:19003,0.0.0.0:30003" \
  --bootstrap-server-mapping "127.0.0.1:19004,0.0.0.0:30004" \
  --bootstrap-server-mapping "127.0.0.1:19005,0.0.0.0:30005" \
  --bootstrap-server-mapping "127.0.0.1:19006,0.0.0.0:30006" \
  --bootstrap-server-mapping "127.0.0.1:19007,0.0.0.0:30007" \
  --bootstrap-server-mapping "127.0.0.1:19008,0.0.0.0:30008" \
  --bootstrap-server-mapping "127.0.0.1:19009,0.0.0.0:30009" \
  --bootstrap-server-mapping "127.0.0.1:19010,0.0.0.0:30010" \
  --bootstrap-server-mapping "127.0.0.1:19011,0.0.0.0:30011" \
  --bootstrap-server-mapping "127.0.0.1:19012,0.0.0.0:30012" \
  --bootstrap-server-mapping "127.0.0.1:19013,0.0.0.0:30013" \
  --dial-address-mapping "127.0.0.1:19001,was-dc2-kafka-1.example.net:9091" \
  --dial-address-mapping "127.0.0.1:19002,was-dc2-kafka-2.example.net:9091" \
  --dial-address-mapping "127.0.0.1:19003,was-dc2-kafka-1.example.net:9091" \
  --dial-address-mapping "127.0.0.1:19004,nyc-kafka-1.example.net:9091" \
  --dial-address-mapping "127.0.0.1:19005,nyc-kafka-2.example.net:9091" \
  --dial-address-mapping "127.0.0.1:19006,nyc-kafka-1.example.net:9091" \
  --dial-address-mapping "127.0.0.1:19007,dfw-kafka-1.example.net:9091" \
  --dial-address-mapping "127.0.0.1:19008,dfw-kafka-2.example.net:9091" \
  --dial-address-mapping "127.0.0.1:19009,dfw-kafka-1.example.net:9091" \
  --dial-address-mapping "127.0.0.1:19010,atl-at2-kafka-1.example.net:9091" \
  --dial-address-mapping "127.0.0.1:19011,atl-at2-kafka-2.example.net:9091" \
  --dial-address-mapping "127.0.0.1:19012,chi-kafka-1.example.net:9091" \
  --dial-address-mapping "127.0.0.1:19013,chi-kafka-2.example.net:9091" \
  --tls-enable \
  --tls-ca-chain-cert-file /Users/gene.liverman/.ssh/ca-certs.pem \
  --tls-client-cert-file /Users/gene.liverman/.ssh/user-cert.pem \
  --tls-client-key-file /Users/gene.liverman/.ssh/user-key.pem \
  --tls-client-key-password $FLOW_CERT_PW \
  --debug-enable

In this scenario, the telegraf config is:

[global_tags]
[agent]
  interval = "10s"
  round_interval = true
  metric_batch_size = 1000
  metric_buffer_limit = 10000
  collection_jitter = "0s"
  flush_interval = "10s"
  flush_jitter = "0s"
  precision = "0s"
  debug = true
  logtarget = "stderr"
  hostname = ""
  omit_hostname = false
[[outputs.kafka]]
  brokers = [
    "127.0.0.1:30001",
    "127.0.0.1:30002",
    "127.0.0.1:30003",
    "127.0.0.1:30004",
    "127.0.0.1:30005",
    "127.0.0.1:30006",
    "127.0.0.1:30007",
    "127.0.0.1:30008",
    "127.0.0.1:30009",
    "127.0.0.1:30010",
    "127.0.0.1:30011",
    "127.0.0.1:30012",
    "127.0.0.1:30013"
  ]
  topic = "private.test.gene.metrics"
  version = "3.7.0"
  routing_tag = "host"
  compression_codec = 2
[[inputs.prometheus]]
  urls = ["http://127.0.0.1:9100/metrics"]