redpanda-data / redpanda

Redpanda is a streaming data platform for developers. Kafka API compatible. 10x faster. No ZooKeeper. No JVM!
https://redpanda.com
9.55k stars 582 forks source link

RdKafka consumer can't connect to redpanda through sasl/tls #7449

Open 3g0r opened 1 year ago

3g0r commented 1 year ago

Version & Environment

Redpanda version: (use rpk version): v22.2.7 (rev 96ed1cc)

Please also give versions of other components:

What went wrong?

Hi, evryone. I test redpanda k8s cluster with the intention for migrate from kafka. In the beginning I tested it in docker-compose without traffic encryption, only with sasl authentication, and it had worked like a charm. But when I had migrated my configuration into k8s env and had enabled tls for one public listener (without mTls authentication), my rdkafka clients couldn't connect to redpanda cluster. I can't figure out why. My attempts to investigate have been unsuccessful.

I tried to take and decrypt traffic with tshark, and I think authentication passed success, but in the next step my clients fallen with '[thrd:GroupCoordinator]: GroupCoordinator/0: my-public-domain-name.:443: SSL handshake failed: Disconnected: connecting to a PLAINTEXT broker listener? (after 86ms in state SSL_HANDSHAKE, 1 identical error(s) suppressed) (_TRANSPORT): identical to last error'

And another wired details: I can connect to redpanda cluster from my pc through rpk cli tool

rpk --brokers=redpanda-0.my-public-domain-name:443 --user test-user-name --password test-password --sasl-mechanism SCRAM-SHA-256 topic  consume test-topic-name --tls-enabled

What should have happened instead?

I was hoping it should work without any problems.

How to reproduce the issue?

redpanda.yaml

redpanda:
    data_directory: /var/lib/redpanda/data
    node_id: 0
    seed_servers: []
    rpc_server:
        address: 0.0.0.0
        port: 33145
    kafka_api:
        - address: 0.0.0.0
          port: 9093
          name: internal
          authentication_method: none
        - address: 0.0.0.0
          port: 9094
          name: default
          authentication_method: sasl
    kafka_api_tls:
        - name: default
          key_file: /etc/tls/certs/default/tls.key
          cert_file: /etc/tls/certs/default/tls.crt
          enabled: true
    admin:
        - address: 0.0.0.0
          port: 9644
          name: admin
    advertised_rpc_api:
        address: redpanda-0.k8s-internal-domain.local.
        port: 33145
    advertised_kafka_api:
        - address: redpanda-0.k8s-internal-domain.local.
          port: 9093
          name: internal
        - address: redpanda-0.my-public-domain-name.
          port: 443
          name: default
rpk:
    enable_usage_stats: true
pandaproxy:
    pandaproxy_api:
        - address: 0.0.0.0
          port: 8082
          name: internal
        - address: 0.0.0.0
          port: 8083
          name: default
    advertised_pandaproxy_api:
        - address: redpanda-0.k8s-internal-domain.local.
          port: 8082
          name: internal
        - address: redpanda-0.k8s-internal-domain.local.
          port: 30082
          name: default
schema_registry: {}

helm installation hook

rpk acl user create admin -p "$REDPANDA_USER_admin_PASSWORD" --api-urls redpanda-0.k8s-internal-domain.local.:9644  
rpk acl user create user-1 -p "$REDPANDA_USER_user-1_PASSWORD" --api-urls redpanda-0.k8s-internal-domain.local.:9644 --user admin --password $REDPANDA_USER_admin_PASSWORD --sasl-mechanism SCRAM-SHA-256   
rpk acl user create user-2 -p "$REDPANDA_USER_user-2_PASSWORD" --api-urls redpanda-0.k8s-internal-domain.local.:9644 --user admin --password $REDPANDA_USER_admin_PASSWORD --sasl-mechanism SCRAM-SHA-256   
rpk acl user create user-3 -p "$REDPANDA_USER_user-3_PASSWORD" --api-urls redpanda-0.k8s-internal-domain.local.:9644 --user admin --password $REDPANDA_USER_admin_PASSWORD --sasl-mechanism SCRAM-SHA-256   
rpk acl user create user-4 -p "$REDPANDA_USER_user-4_PASSWORD" --api-urls redpanda-0.k8s-internal-domain.local.:9644 --user admin --password $REDPANDA_USER_admin_PASSWORD --sasl-mechanism SCRAM-SHA-256   

helm update hook

rpk cluster config import -f /tmp/base-config/bootstrap.yaml --api-urls redpanda-0.k8s-internal-domain.local.:9644 --user admin --password $REDPANDA_USER_admin_PASSWORD --sasl-mechanism SCRAM-SHA-256
rpk cluster config set auto_create_topics_enabled false --api-urls redpanda-0.k8s-internal-domain.local.:9644 --user admin --password $REDPANDA_USER_admin_PASSWORD --sasl-mechanism SCRAM-SHA-256
rpk cluster config set storage_min_free_bytes 1073741824 --api-urls redpanda-0.k8s-internal-domain.local.:9644 --user admin --password $REDPANDA_USER_admin_PASSWORD --sasl-mechanism SCRAM-SHA-256

node-rdkafka: "^2.14.0" settings

const Kafka = require('node-rdkafka');
const consumer = new Kafka.KafkaConsumer({
  'group.id': 'test',
  'metadata.broker.list': 'redpanda-0.my-public-domain-name',
  'sasl.username': 'user-1',
  'sasl.password': 'password',
  'sasl.mechanism': 'SCRAM-SHA-256',
  'security.protocol': 'sasl_ssl',
  'debug': 'security,broker,consumer,interceptor',
}, {});

consumer.connect();

consumer
  .on('ready', function() {
    consumer.subscribe(config.topics);
    consumer.consume();
  })
  .on('data', function(data) {
    console.log(data.value.toString());
  })
  .on('event.error', console.error)
  .on('event.stats', console.error)
  .on('event.log', console.error)
  .on('event.event', console.error);

Additional information

Please attach any relevant logs, backtraces, or metric charts.

Not any wired logs for panda cluster in standard log level. rdkafka client:

[thrd:app]: Selected provider SCRAM (builtin) for SASL mechanism SCRAM-SHA-256'
[thrd:app]: Using OpenSSL version OpenSSL 1.1.1k+quic  25 Mar 2021 (0x101010bf, librdkafka built with 0x1010104f)'
[thrd:app]: GroupCoordinator: Added new broker with NodeId -1'
[thrd:GroupCoordinator]: GroupCoordinator: Enter main broker thread'
[thrd:app]: sasl_ssl://redpanda-0.my-public-domain-name:443/bootstrap: Added new broker with NodeId -1'
[thrd:app]: librdkafka v1.9.2 (0x10902ff) rdkafka#consumer-1 initialized (builtin.features gzip,snappy,ssl,sasl,regex,lz4,sasl_plain,sasl_scram,plugins,sasl_oauthbearer,http,oidc, GCC GXX PKGCONFIG INSTALL GNULD LDS C11THREADS LIBDL PLUGINS ZLIB SSL CURL HDRHISTOGRAM SYSLOG SNAPPY SOCKEM SASL_SCRAM SASL_OAUTHBEARER OAUTHBEARER_OIDC CRC32C_HW, debug 0x202)'
[thrd::0/internal]: :0/internal: Enter main broker thread'
[thrd:sasl_ssl://redpanda-0.my-public-domain-name:443/bo]: sasl_ssl://redpanda-0.my-public-domain-name:443/bootstrap: Enter main broker thread'
[thrd:app]: Not selecting any broker for cluster connection: still suppressed for 49ms: application metadata request'
[thrd:main]: sasl_ssl://redpanda-0.my-public-domain-name:443/bootstrap: Selected for cluster connection: coordinator query (broker has 0 connection attempt(s))'
[thrd:sasl_ssl://redpanda-0.my-public-domain-name:443/bo]: sasl_ssl://redpanda-0.my-public-domain-name:443/bootstrap: Received CONNECT op'
[thrd:sasl_ssl://redpanda-0.my-public-domain-name:443/bo]: sasl_ssl://redpanda-0.my-public-domain-name:443/bootstrap: Broker changed state INIT -> TRY_CONNECT'
[thrd:sasl_ssl://redpanda-0.my-public-domain-name:443/bo]: sasl_ssl://redpanda-0.my-public-domain-name:443/bootstrap: broker in state TRY_CONNECT connecting'
[thrd:sasl_ssl://redpanda-0.my-public-domain-name:443/bo]: sasl_ssl://redpanda-0.my-public-domain-name:443/bootstrap: Broker changed state TRY_CONNECT -> CONNECT'
[thrd:app]: Not selecting any broker for cluster connection: still suppressed for 49ms: application metadata request'
[thrd:app]: Not selecting any broker for cluster connection: still suppressed for 48ms: application metadata request'
[thrd:sasl_ssl://redpanda-0.my-public-domain-name:443/bo]: sasl_ssl://redpanda-0.my-public-domain-name:443/bootstrap: Connecting to ipv4#185.187.91.228:443 (sasl_ssl) with socket 23'
[thrd:sasl_ssl://redpanda-0.my-public-domain-name:443/bo]: sasl_ssl://redpanda-0.my-public-domain-name:443/bootstrap: Connected to ipv4#185.187.91.228:443'
[thrd:sasl_ssl://redpanda-0.my-public-domain-name:443/bo]: sasl_ssl://redpanda-0.my-public-domain-name:443/bootstrap: Broker changed state CONNECT -> SSL_HANDSHAKE'
[thrd:app]: Cluster connection already in progress: application metadata request'
[thrd:sasl_ssl://redpanda-0.my-public-domain-name:443/bo]: sasl_ssl://redpanda-0.my-public-domain-name:443/bootstrap: Connected (#1)'
[thrd:sasl_ssl://redpanda-0.my-public-domain-name:443/bo]: sasl_ssl://redpanda-0.my-public-domain-name:443/bootstrap: Updated enabled protocol features +ApiVersion to ApiVersion'
[thrd:sasl_ssl://redpanda-0.my-public-domain-name:443/bo]: sasl_ssl://redpanda-0.my-public-domain-name:443/bootstrap: Broker changed state SSL_HANDSHAKE -> APIVERSION_QUERY'
[thrd:app]: Cluster connection already in progress: application metadata request'
[thrd:sasl_ssl://redpanda-0.my-public-domain-name:443/bo]: sasl_ssl://redpanda-0.my-public-domain-name:443/bootstrap: Updated enabled protocol features to ApiVersion,BrokerBalancedConsumer,Sasl,SaslHandshake,BrokerGroupCoordinator,LZ4,OffsetTime,MsgVer2,IdempotentProducer,ZSTD,SaslAuthReq'
[thrd:sasl_ssl://redpanda-0.my-public-domain-name:443/bo]: sasl_ssl://redpanda-0.my-public-domain-name:443/bootstrap: Auth in state APIVERSION_QUERY (handshake supported)'
[thrd:sasl_ssl://redpanda-0.my-public-domain-name:443/bo]: sasl_ssl://redpanda-0.my-public-domain-name:443/bootstrap: Broker changed state APIVERSION_QUERY -> AUTH_HANDSHAKE'
[thrd:app]: Cluster connection already in progress: application metadata request'
[thrd:sasl_ssl://redpanda-0.my-public-domain-name:443/bo]: sasl_ssl://redpanda-0.my-public-domain-name:443/bootstrap: Broker supported SASL mechanisms: SCRAM-SHA-256,SCRAM-SHA-512'
[thrd:sasl_ssl://redpanda-0.my-public-domain-name:443/bo]: sasl_ssl://redpanda-0.my-public-domain-name:443/bootstrap: Auth in state AUTH_HANDSHAKE (handshake supported)'
[thrd:sasl_ssl://redpanda-0.my-public-domain-name:443/bo]: sasl_ssl://redpanda-0.my-public-domain-name:443/bootstrap: Broker changed state AUTH_HANDSHAKE -> AUTH_REQ'
[thrd:sasl_ssl://redpanda-0.my-public-domain-name:443/bo]: sasl_ssl://redpanda-0.my-public-domain-name:443/bootstrap: Initializing SASL client: service name kafka, hostname redpanda-0.my-public-domain-name, mechanisms SCRAM-SHA-256, provider SCRAM (builtin)'
[thrd:sasl_ssl://redpanda-0.my-public-domain-name:443/bo]: sasl_ssl://redpanda-0.my-public-domain-name:443/bootstrap: SASL SCRAM client in state client-first-message'
[thrd:sasl_ssl://redpanda-0.my-public-domain-name:443/bo]: sasl_ssl://redpanda-0.my-public-domain-name:443/bootstrap: Send SASL Kafka frame to broker (66 bytes)'
[thrd:app]: Cluster connection already in progress: application metadata request'
[thrd:sasl_ssl://redpanda-0.my-public-domain-name:443/bo]: sasl_ssl://redpanda-0.my-public-domain-name:443/bootstrap: Received SASL frame from broker (350 bytes)'
[thrd:sasl_ssl://redpanda-0.my-public-domain-name:443/bo]: sasl_ssl://redpanda-0.my-public-domain-name:443/bootstrap: SASL SCRAM client in state server-first-message'
[thrd:sasl_ssl://redpanda-0.my-public-domain-name:443/bo]: sasl_ssl://redpanda-0.my-public-domain-name:443/bootstrap: Send SASL Kafka frame to broker (250 bytes)'
[thrd:sasl_ssl://redpanda-0.my-public-domain-name:443/bo]: sasl_ssl://redpanda-0.my-public-domain-name:443/bootstrap: Received SASL frame from broker (46 bytes)'
[thrd:sasl_ssl://redpanda-0.my-public-domain-name:443/bo]: sasl_ssl://redpanda-0.my-public-domain-name:443/bootstrap: SASL SCRAM client in state client-final-message'
[thrd:sasl_ssl://redpanda-0.my-public-domain-name:443/bo]: sasl_ssl://redpanda-0.my-public-domain-name:443/bootstrap: SASL SCRAM authentication successful on server: verifying ServerSignature'
[thrd:sasl_ssl://redpanda-0.my-public-domain-name:443/bo]: sasl_ssl://redpanda-0.my-public-domain-name:443/bootstrap: Authenticated as cluster-selectel-ru-9-test using SCRAM-SHA-256'
[thrd:sasl_ssl://redpanda-0.my-public-domain-name:443/bo]: sasl_ssl://redpanda-0.my-public-domain-name:443/bootstrap: Broker changed state AUTH_REQ -> UP'
[thrd:main]: sasl_ssl://redpanda-2.my-public-domain-name.:443/2: Added new broker with NodeId 2'
[thrd:sasl_ssl://redpanda-2.my-public-domain-name.:443/2]: sasl_ssl://redpanda-2.my-public-domain-name.:443/2: Enter main broker thread'
[thrd:main]: sasl_ssl://redpanda-1.my-public-domain-name.:443/0: Added new broker with NodeId 0'
[thrd:main]: sasl_ssl://redpanda-0.my-public-domain-name.:443/1: Added new broker with NodeId 1'
[thrd:sasl_ssl://redpanda-1.my-public-domain-name.:443/0]: sasl_ssl://redpanda-1.my-public-domain-name.:443/0: Enter main broker thread'
[thrd:sasl_ssl://redpanda-0.my-public-domain-name.:443/1]: sasl_ssl://redpanda-0.my-public-domain-name.:443/1: Enter main broker thread'
[thrd:main]: sasl_ssl://redpanda-0.my-public-domain-name:443/bootstrap: ClusterId update "" -> "redpanda.3b906abb-9db9-4ba0-aa93-7a822bb4a93e"'
[thrd:main]: sasl_ssl://redpanda-0.my-public-domain-name:443/bootstrap: ControllerId update -1 -> 1'
[thrd:main]: GroupCoordinator: Broker nodename changed from "" to "redpanda-1.my-public-domain-name.:443"'
[thrd:main]: GroupCoordinator: Broker nodeid changed from -1 to 0'
[thrd:GroupCoordinator]: GroupCoordinator/0: Received CONNECT op'
[thrd:GroupCoordinator]: GroupCoordinator: Broker changed state INIT -> TRY_CONNECT'
[thrd:GroupCoordinator]: GroupCoordinator/0: broker in state TRY_CONNECT connecting'
[thrd:GroupCoordinator]: GroupCoordinator: Broker changed state TRY_CONNECT -> CONNECT'
[thrd:GroupCoordinator]: GroupCoordinator/0: Connecting to ipv4#185.187.91.228:443 (sasl_ssl) with socket 31'
[thrd:GroupCoordinator]: GroupCoordinator/0: Connected to ipv4#185.187.91.228:443'
[thrd:GroupCoordinator]: GroupCoordinator: Broker changed state CONNECT -> SSL_HANDSHAKE'
[thrd:GroupCoordinator]: GroupCoordinator/0: redpanda-1.my-public-domain-name.:443: SSL handshake failed: Disconnected: connecting to a PLAINTEXT broker listener? (after 102ms in state SSL_HANDSHAKE) (_TRANSPORT)'
[thrd:GroupCoordinator]: GroupCoordinator: redpanda-1.my-public-domain-name.:443: SSL handshake failed: Disconnected: connecting to a PLAINTEXT broker listener? (after 102ms in state SSL_HANDSHAKE)'
[thrd:GroupCoordinator]: GroupCoordinator: Broker changed state SSL_HANDSHAKE -> DOWN'
[thrd:GroupCoordinator]: GroupCoordinator: Broker changed state DOWN -> INIT'
[thrd:GroupCoordinator]: GroupCoordinator: Broker changed state INIT -> TRY_CONNECT'
[thrd:GroupCoordinator]: GroupCoordinator/0: Delaying next reconnect by 145ms'
[LibrdKafkaError: Local: Broker transport failure]
  origin: 'local',
  message: 'broker transport failure',
  code: -1,
  errno: -1
[thrd:GroupCoordinator]: GroupCoordinator/0: broker in state TRY_CONNECT connecting'
[thrd:GroupCoordinator]: GroupCoordinator: Broker changed state TRY_CONNECT -> CONNECT'
[thrd:GroupCoordinator]: GroupCoordinator/0: Connecting to ipv4#185.187.91.228:443 (sasl_ssl) with socket 31'
[thrd:GroupCoordinator]: GroupCoordinator/0: Connected to ipv4#185.187.91.228:443'
[thrd:GroupCoordinator]: GroupCoordinator: Broker changed state CONNECT -> SSL_HANDSHAKE'
[thrd:GroupCoordinator]: GroupCoordinator/0: redpanda-1.my-public-domain-name.:443: SSL handshake failed: Disconnected: connecting to a PLAINTEXT broker listener? (after 86ms in state SSL_HANDSHAKE, 1 identical error(s) suppressed) (_TRANSPORT): identical to last error'
[thrd:GroupCoordinator]: GroupCoordinator: redpanda-1.my-public-domain-name.:443: SSL handshake failed: Disconnected: connecting to a PLAINTEXT broker listener? (after 86ms in state SSL_HANDSHAKE, 1 identical error(s) suppressed)'

JIRA Link: CORE-1096

dotnwat commented 1 year ago

thanks for the report @3g0r. i would have expected that in both the rpk and rdkafka cases that the client configuration would be pointing at some certificates for the client to use when connecting. are those missing or are they being picked up some some how?

also, fyi, github issues is generally used for reporting bugs. we have a slack https://redpanda.com/slack for troubleshooting stuff like this.

3g0r commented 1 year ago

Hi dotnwat! Thank you for answer. Before we continue, I should move the issue/discussions into redpanda.com/slack?

some certificates for the client to use when connecting

No, not missing - I don't use any explicit certificates for client. In server side it is standard DNS+TLS configuration like for https setup. That approach works with kafka cluster + rdkafka client, and also it works with redpanda cluster

Capturing on 'eth0'
    1 0.000000000   172.20.0.3 ? 192.168.65.5 DNS 105 Standard query 0x96bd A redpanda-0.my-public-domain-name
    2 0.003063216 192.168.65.5 ? 172.20.0.3   DNS 222 Standard query response 0x96bd A redpanda-0.my-public-domain-name CNAME 185.187.91.228.nip.io A 185.187.91.228
    3 0.004262950   172.20.0.3 ? 185.187.91.228 TCP 74 34926 ? 443 [SYN] Seq=0 Win=64240 Len=0 MSS=1460 SACK_PERM=1 TSval=2051805249 TSecr=0 WS=128
    4 0.069407040 185.187.91.228 ? 172.20.0.3   TCP 62 443 ? 34926 [SYN, ACK] Seq=0 Ack=1 Win=65535 Len=0 MSS=1460 WS=4
    5 0.069461825   172.20.0.3 ? 185.187.91.228 TCP 54 34926 ? 443 [ACK] Seq=1 Ack=1 Win=64256 Len=0
    6 0.069782439   172.20.0.3 ? 185.187.91.228 TLSv1 391 Client Hello
    7 0.070067825 185.187.91.228 ? 172.20.0.3   TCP 54 443 ? 34926 [ACK] Seq=1 Ack=338 Win=262140 Len=0
    8 0.152257087 185.187.91.228 ? 172.20.0.3   TLSv1.3 187 Server Hello, Change Cipher Spec
    9 0.152282328   172.20.0.3 ? 185.187.91.228 TCP 54 34926 ? 443 [ACK] Seq=338 Ack=134 Win=64128 Len=0
   10 0.153618448 185.187.91.228 ? 172.20.0.3   TLSv1.3 1414 Encrypted Extensions
   11 0.153655108   172.20.0.3 ? 185.187.91.228 TCP 54 34926 ? 443 [ACK] Seq=338 Ack=1494 Win=64128 Len=0
   12 0.154074133 185.187.91.228 ? 172.20.0.3   TCP 1414 443 ? 34926 [PSH, ACK] Seq=1494 Ack=338 Win=262140 Len=1360 [TCP segment of a reassembled PDU]
   13 0.154099777   172.20.0.3 ? 185.187.91.228 TCP 54 34926 ? 443 [ACK] Seq=338 Ack=2854 Win=64128 Len=0
   14 0.155696226 185.187.91.228 ? 172.20.0.3   TLSv1.3 1514 Certificate [TCP segment of a reassembled PDU]
   15 0.155714216   172.20.0.3 ? 185.187.91.228 TCP 54 34926 ? 443 [ACK] Seq=338 Ack=4314 Win=64128 Len=0
   16 0.155763560 185.187.91.228 ? 172.20.0.3   TLSv1.3 355 Certificate Verify, Finished
   17 0.155803984   172.20.0.3 ? 185.187.91.228 TCP 54 34926 ? 443 [ACK] Seq=338 Ack=4615 Win=63872 Len=0
   18 0.157066246   172.20.0.3 ? 185.187.91.228 TLSv1.3 134 Change Cipher Spec, Finished
jcsp commented 1 year ago

@3g0r is this issue still happening, or did you get resolution elsewhere?

The log makes me a bit suspicious that something is going wrong with whatever is forwarding from the public :443 port to redpanda's :9094 port: we can see the client successfully speaking TLS to its seed broker initially, but then when it tries opening a connection to another broker, it's somehow hitting a non-tls port.