strimzi / strimzi-kafka-operator

Apache Kafka® running on Kubernetes
https://strimzi.io/
Apache License 2.0
4.81k stars 1.29k forks source link

[Question] Kafka Ingress fails with Bad Gateway #4082

Closed oana-s closed 3 years ago

oana-s commented 3 years ago

This is my Kafka cluster config

      - name: external
        port: 9094
        type: ingress
        tls: true
        authentication:
          type: scram-sha-512
        configuration:
          bootstrap:
            host: my-cluster-kafka-external-bootstrap
          brokers:
          - broker: 0
            host: my-cluster-kafka-0

I create the NGINX controller using this helm install nginx stable/nginx-ingress --set rbac.create=true --set controller.service.type=LoadBalancer --set controller.extraArgs.annotations-prefix=nginx.ingress.kubernetes.io --set controller.extraArgs.enable-ssl-passthrough=""

When I try to curl i get the following error: curl -k -s https://20.76.0.53:443 -H "Host: my-cluster-kafka-0" -v

* Rebuilt URL to: https://20.76.0.53:443/
*   Trying 20.76.0.53...
* TCP_NODELAY set
* Connected to 20.76.0.53 (20.76.0.53) port 443 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
*   CAfile: /etc/ssl/certs/ca-certificates.crt
  CApath: /etc/ssl/certs
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
* TLSv1.3 (IN), TLS handshake, Server hello (2):
* TLSv1.3 (IN), TLS Unknown, Certificate Status (22):
* TLSv1.3 (IN), TLS handshake, Unknown (8):
* TLSv1.3 (IN), TLS Unknown, Certificate Status (22):
* TLSv1.3 (IN), TLS handshake, Certificate (11):
* TLSv1.3 (IN), TLS Unknown, Certificate Status (22):
* TLSv1.3 (IN), TLS handshake, CERT verify (15):
* TLSv1.3 (IN), TLS Unknown, Certificate Status (22):
* TLSv1.3 (IN), TLS handshake, Finished (20):
* TLSv1.3 (OUT), TLS change cipher, Client hello (1):
* TLSv1.3 (OUT), TLS Unknown, Certificate Status (22):
* TLSv1.3 (OUT), TLS handshake, Finished (20):
* SSL connection using TLSv1.3 / TLS_AES_256_GCM_SHA384
* ALPN, server accepted to use h2
* Server certificate:
*  subject: O=Acme Co; CN=Kubernetes Ingress Controller Fake Certificate
*  start date: Dec 10 07:46:37 2020 GMT
*  expire date: Dec 10 07:46:37 2021 GMT
*  issuer: O=Acme Co; CN=Kubernetes Ingress Controller Fake Certificate
*  SSL certificate verify result: unable to get local issuer certificate (20), continuing anyway.
* Using HTTP2, server supports multi-use
* Connection state changed (HTTP/2 confirmed)
* Copying HTTP/2 data in stream buffer to connection buffer after upgrade: len=0
* TLSv1.3 (OUT), TLS Unknown, Unknown (23):
* TLSv1.3 (OUT), TLS Unknown, Unknown (23):
* TLSv1.3 (OUT), TLS Unknown, Unknown (23):
* Using Stream ID: 1 (easy handle 0x7fffd90097e0)
* TLSv1.3 (OUT), TLS Unknown, Unknown (23):
> GET / HTTP/2
> Host: my-cluster-kafka-0
> User-Agent: curl/7.58.0
> Accept: */*
>
* TLSv1.3 (IN), TLS Unknown, Certificate Status (22):
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
* TLSv1.3 (IN), TLS Unknown, Certificate Status (22):
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
* TLSv1.3 (IN), TLS Unknown, Unknown (23):
* Connection state changed (MAX_CONCURRENT_STREAMS updated)!
* TLSv1.3 (OUT), TLS Unknown, Unknown (23):
* TLSv1.3 (IN), TLS Unknown, Unknown (23):
* TLSv1.3 (IN), TLS Unknown, Unknown (23):
< HTTP/2 502
< server: nginx/1.19.1
< date: Thu, 10 Dec 2020 08:47:32 GMT
< content-type: text/html
< content-length: 157
< strict-transport-security: max-age=15724800; includeSubDomains
<
<html>
<head><title>502 Bad Gateway</title></head>
<body>
<center><h1>502 Bad Gateway</h1></center>
<hr><center>nginx/1.19.1</center>
</body>
</html>
* Connection #0 to host 20.76.0.53 left intact

Logs from broker:

2020-12-10 07:57:58,420 WARN [SocketServer brokerId=0] Unexpected error from 10-244-0-14.nginx-nginx-ingress-controller.default.svc.cluster.local/10.244.0.14; closing connection (org.apache.kafka.common.network.Selector) [data-plane-kafka-network-thread-0-ListenerName(EXTERNAL-9094)-SASL_SSL-8]
org.apache.kafka.common.network.InvalidReceiveException: Invalid receive (size = 1195725856 larger than 524288)
        at org.apache.kafka.common.network.NetworkReceive.readFrom(NetworkReceive.java:105)
        at org.apache.kafka.common.security.authenticator.SaslServerAuthenticator.authenticate(SaslServerAuthenticator.java:246)
        at org.apache.kafka.common.network.KafkaChannel.prepare(KafkaChannel.java:176)
        at org.apache.kafka.common.network.Selector.pollSelectionKeys(Selector.java:547)
        at org.apache.kafka.common.network.Selector.poll(Selector.java:485)
        at kafka.network.Processor.poll(SocketServer.scala:913)
        at kafka.network.Processor.run(SocketServer.scala:816)
        at java.base/java.lang.Thread.run(Thread.java:834)

Logs from NGINX

2020/12/10 07:57:58 [error] 111#111: *4370 upstream prematurely closed connection while reading response header from upstream, client: 127.0.0.1, server: my-cluster-kafka-0, request: "GET / HTTP/2.0", upstream: "https://10.244.0.16:9094/", host: "my-cluster-kafka-0"
2020/12/10 07:57:58 [error] 111#111: *4370 upstream prematurely closed connection while reading response header from upstream, client: 127.0.0.1, server: my-cluster-kafka-0, request: "GET / HTTP/2.0", upstream: "https://10.244.0.16:9094/", host: "my-cluster-kafka-0"
2020/12/10 07:57:58 [error] 111#111: *4370 upstream prematurely closed connection while reading response header from upstream, client: 127.0.0.1, server: my-cluster-kafka-0, request: "GET / HTTP/2.0", upstream: "https://10.244.0.16:9094/", host: "my-cluster-kafka-0"
127.0.0.1 - - [10/Dec/2020:07:57:58 +0000] "GET / HTTP/2.0" 502 157 "-" "curl/7.58.0" 33 0.044 [default-my-cluster-kafka-0-9094] [] 10.244.0.16:9094, 10.244.0.16:9094, 10.244.0.16:9094 0, 0, 0 0.012, 0.016, 0.016 502, 502, 502 f364483c267cffdb46f63a7f95b19015

Can you please help me out what i am doing wrong? thank you!

scholzj commented 3 years ago

I'm not sure I would expect curl to work. Kafka does not do HTTP protocol (you can use the HTTP Bridge for that - https://strimzi.io/docs/operators/latest/full/deploying.html#kafka-bridge-str), but you cannot talk HTTP to Kafka directly. There is also no Host header, the TLs passthrough needs to work on TLS-SNI. So in theory, you should be able to do something like curl -v https://my-cluster-kafka-0 and it should get to the broker. But because of the HTTP, it will never get any nice response.

Better way how to just test the setup is to do openssl s_client -connect my-cluster-kafka-0:443 -servername my-cluster-kafka-0 -showcerts => if it gives you the certificates from the broker, then the ingress setup is ok and you can connect with Kafka clients.

oana-s commented 3 years ago

I have been able to retrieve the certificates using kubectl get secret my-cluster-cluster-ca-cert -o jsonpath='{.data.ca\.crt}' | base64 -d > ca.crt

which I am using, unsuccessfully, along with the following config on client side:

                BootstrapServers = "20.76.0.53:443",
                SecurityProtocol = SecurityProtocol.SaslSsl,
                SaslMechanism = SaslMechanism.ScramSha512,
                SaslUsername = user,
                SaslPassword = pwd,
                SslCaLocation = "ca.crt"

To be mentioned, that before changing to Ingress, I was using LoadBalancer as external listeners and the whole flow was functional. As mentioned, on client side, I get some Confluent .NET generic error and, weird enough, I cannot see anything in the broker nor in the nginx logs, as the request wouldn't even reach it.

I'm not sure that is relevant but when i try to describe the ingress, I get

osamf@SPS-NB244:~/allfiles$ kubectl describe ingress.extensions/my-cluster-kafka-bootstrap
Name:             my-cluster-kafka-bootstrap
Namespace:        default
Address:          10.240.0.4
Default backend:  default-http-backend:80 (<error: endpoints "default-http-backend" not found>)
TLS:
  SNI routes my-cluster-kafka-external-bootstrap
Rules:
  Host                                 Path  Backends
  ----                                 ----  --------
  my-cluster-kafka-external-bootstrap
                                       /   my-cluster-kafka-external-bootstrap:9094 (10.244.0.16:9094)
Annotations:                           ingress.kubernetes.io/ssl-passthrough: true
                                       kubernetes.io/ingress.class: nginx
                                       nginx.ingress.kubernetes.io/backend-protocol: HTTPS
                                       nginx.ingress.kubernetes.io/ssl-passthrough: true
Events:                                <none>

When I query the services kubectl get svc i get

NAME                                  TYPE           CLUSTER-IP     EXTERNAL-IP   PORT(S)                      AGE
kubernetes                            ClusterIP      10.0.0.1       <none>        443/TCP                      4h55m
my-cluster-kafka-0                    ClusterIP      10.0.218.224   <none>        9094/TCP                     4h2m
my-cluster-kafka-bootstrap            ClusterIP      10.0.93.18     <none>        9091/TCP,9092/TCP            4h2m
my-cluster-kafka-brokers              ClusterIP      None           <none>        9091/TCP,9092/TCP            4h2m
my-cluster-kafka-external-bootstrap   ClusterIP      10.0.255.22    <none>        9094/TCP                     4h2m
my-cluster-zookeeper-client           ClusterIP      10.0.215.162   <none>        2181/TCP                     4h4m
my-cluster-zookeeper-nodes            ClusterIP      None           <none>        2181/TCP,2888/TCP,3888/TCP   4h4m
nginx-nginx-ingress-controller        LoadBalancer   10.0.21.148    20.76.0.53    80:31180/TCP,443:32575/TCP   4h5m
nginx-nginx-ingress-default-backend   ClusterIP      10.0.141.215   <none>        80/TCP                       4h5m
scholzj commented 3 years ago

So, the way the Ingress works:

So have to doublecheck these two steps ... make sure the routing works (I'm not sure what environment do you use. But for example the basic mechanism working on any Linux or MacOS machine would be to add these to /etc/hosts. Or if you have some DNS server, you could of course use that as well). And once the routing works, configure the right address in the client.

oana-s commented 3 years ago

Perfect! thank you very much for your guidance! For some reason, the routing only worked with my-cluster-kafka-0... maybe I missed something. Nevertheless, thank you again! I will close the ticket now!