strimzi / strimzi-kafka-operator

Apache Kafka® running on Kubernetes
https://strimzi.io/
Apache License 2.0
4.84k stars 1.29k forks source link

Unable to connect to zookeeper from other software #1337

Closed alexismichelFreelance closed 5 years ago

alexismichelFreelance commented 5 years ago

I am trying to install kafka-manager but it asks for a zookeeper host (ZK_HOST). giving it the zookeeper service at port 2181 ends up generating a timeout on the socket, probably because it is hitting the TlsSideCar instead, and is probably not configured to have access. Same goes for trifecta, kafka-topics-ui, kafka-rest etc, basically everything that requires a ZK_HOST to start connecting to the cluster. I was wondering if there was anything I could do to resolve this, to make them able to connect to zookeeper. I definitely tried to create a new service pointing to 21811 (the port that zookeeper container listens to) trying to bypass the TlsSideCar entirely but same, the connection times out.

Do you have any ideas of what else I could try ? Here are my logs from kafka-manager, but they are pretty much self explanatory: [info] o.a.z.ClientCnxn - Opening socket connection to server kafka-zookeeper-direct.kafka.svc.cluster.local/10.3.197.87:2181. Will not attempt to authenticate using SASL (unknown error) [warn] o.a.z.ClientCnxn - Client session timed out, have not heard from server in 60059ms for sessionid 0x0 [info] o.a.z.ClientCnxn - Client session timed out, have not heard from server in 60059ms for sessionid 0x0, closing socket connection and attempting reconnect [info] o.a.z.ClientCnxn - Opening socket connection to server kafka-zookeeper-direct.kafka.svc.cluster.local/10.3.197.87:2181. Will not attempt to authenticate using SASL (unknown error) [warn] o.a.c.ConnectionState - Connection attempt unsuccessful after 120324 (greater than max timeout of 60000). Resetting connection and trying again with a new connection.

scholzj commented 5 years ago

This is intentional. We do not want third party applications use the Zookeeper because it could have negative impact on Kafka cluster availability and because Zookeeper is quite hard to secure. If you really need a workaround, you can use this deployment which can proxy Zookeeper (it expects your Kafka cluster to be named my-cluster - if you use different name you should change it in the fields where my-cluster is used). Afterwards you should be just able to connect to zoo-entrance:2181.

alexismichelFreelance commented 5 years ago

Thank you, just what I needed. I will keep your warnings in mind about destabilizing zookeeper, and only use it for readonly/debugging purposes. Thanks !

alexismichelFreelance commented 5 years ago

Unfortunately, I think there is still some problems from my zookeeper install. The zoo-entrance gets also timeouts : Starting Stunnel with configuration: pid = /usr/local/var/run/stunnel.pid foreground = yes debug = notice [zookeeper-2181] client = yes CAfile = /tmp/cluster-ca.crt cert = /etc/tls-sidecar/eo-certs/entity-operator.crt key = /etc/tls-sidecar/eo-certs/entity-operator.key accept = 0.0.0.0:2181 connect = kafka-cluster-zookeeper-client.kafka.svc.cluster.local:2181 delay = yes verify = 2 2019.02.15 13:56:49 LOG5[1:140128029001792]: stunnel 4.56 on x86_64-redhat-linux-gnu platform 2019.02.15 13:56:49 LOG5[1:140128029001792]: Compiled/running with OpenSSL 1.0.1e-fips 11 Feb 2013 2019.02.15 13:56:49 LOG5[1:140128029001792]: Threading:PTHREAD Sockets:POLL,IPv6 SSL:ENGINE,OCSP,FIPS Auth:LIBWRAP 2019.02.15 13:56:49 LOG5[1:140128029001792]: Reading configuration from file /tmp/stunnel.conf 2019.02.15 13:56:49 LOG5[1:140128029001792]: FIPS mode is enabled 2019.02.15 13:56:49 LOG5[1:140128029001792]: Configuration successful 2019.02.15 13:57:24 LOG5[1:140128028997376]: Service [zookeeper-2181] accepted connection from 10.2.2.97:41106 2019.02.15 13:57:34 LOG3[1:140128028997376]: connect_blocking: s_poll_wait 10.3.39.80:2181: TIMEOUTconnect exceeded 2019.02.15 13:57:34 LOG5[1:140128028997376]: Connection reset: 0 byte(s) sent to SSL, 0 byte(s) sent to socket 2019.02.15 13:57:35 LOG5[1:140128028997376]: Service [zookeeper-2181] accepted connection from 10.2.2.97:41162 2019.02.15 13:57:36 LOG5[1:140128028858112]: Service [zookeeper-2181] accepted connection from 10.2.3.86:51350 2019.02.15 13:57:36 LOG5[1:140128028927744]: Service [zookeeper-2181] accepted connection from 10.2.0.96:51850 2019.02.15 13:57:45 LOG3[1:140128028997376]: connect_blocking: s_poll_wait 10.3.39.80:2181: TIMEOUTconnect exceeded 2019.02.15 13:57:45 LOG5[1:140128028997376]: Connection reset: 0 byte(s) sent to SSL, 0 byte(s) sent to socket 2019.02.15 13:57:46 LOG3[1:140128028927744]: connect_blocking: s_poll_wait 10.3.39.80:2181: TIMEOUTconnect exceeded 2019.02.15 13:57:46 LOG5[1:140128028927744]: Connection reset: 0 byte(s) sent to SSL, 0 byte(s) sent to socket 2019.02.15 13:57:46 LOG3[1:140128028858112]: connect_blocking: s_poll_wait 10.3.39.80:2181: TIMEOUTconnect exceeded 2019.02.15 13:57:46 LOG5[1:140128028858112]: Connection reset: 0 byte(s) sent to SSL, 0 byte(s) sent to socket 2019.02.15 13:57:47 LOG5[1:140128028858112]: Service [zookeeper-2181] accepted connection from 10.2.2.97:41220 2019.02.15 13:57:48 LOG5[1:140128028927744]: Service [zookeeper-2181] accepted connection from 10.2.3.86:51390 2019.02.15 13:57:48 LOG5[1:140128028997376]: Service [zookeeper-2181] accepted connection from 10.2.0.96:51908 2019.02.15 13:57:57 LOG3[1:140128028858112]: connect_blocking: s_poll_wait 10.3.39.80:2181: TIMEOUTconnect exceeded 2019.02.15 13:57:57 LOG5[1:140128028858112]: Connection reset: 0 byte(s) sent to SSL, 0 byte(s) sent to socket 2019.02.15 13:57:58 LOG3[1:140128028927744]: connect_blocking: s_poll_wait 10.3.39.80:2181: TIMEOUTconnect exceeded 2019.02.15 13:57:58 LOG5[1:140128028927744]: Connection reset: 0 byte(s) sent to SSL, 0 byte(s) sent to socket 2019.02.15 13:57:58 LOG3[1:140128028997376]: connect_blocking: s_poll_wait 10.3.39.80:2181: TIMEOUTconnect exceeded 2019.02.15 13:57:58 LOG5[1:140128028997376]: Connection reset: 0 byte(s) sent to SSL, 0 byte(s) sent to socket 2019.02.15 13:57:59 LOG5[1:140128028997376]: Service [zookeeper-2181] accepted connection from 10.2.2.97:41274 2019.02.15 13:57:59 LOG5[1:140128028927744]: Service [zookeeper-2181] accepted connection from 10.2.0.96:51968 2019.02.15 13:58:00 LOG5[1:140128028858112]: Service [zookeeper-2181] accepted connection from 10.2.3.86:51424 2019.02.15 13:58:09 LOG3[1:140128028997376]: connect_blocking: s_poll_wait 10.3.39.80:2181: TIMEOUTconnect exceeded 2019.02.15 13:58:09 LOG5[1:140128028997376]: Connection reset: 0 byte(s) sent to SSL, 0 byte(s) sent to socket 2019.02.15 13:58:09 LOG3[1:140128028927744]: connect_blocking: s_poll_wait 10.3.39.80:2181: TIMEOUTconnect exceeded 2019.02.15 13:58:09 LOG5[1:140128028927744]: Connection reset: 0 byte(s) sent to SSL, 0 byte(s) sent to socket 2019.02.15 13:58:10 LOG3[1:140128028858112]: connect_blocking: s_poll_wait 10.3.39.80:2181: TIMEOUTconnect exceeded 2019.02.15 13:58:10 LOG5[1:140128028858112]: Connection reset: 0 byte(s) sent to SSL, 0 byte(s) sent to socket 2019.02.15 13:58:10 LOG5[1:140128028858112]: Service [zookeeper-2181] accepted connection from 10.2.2.97:41320 2019.02.15 13:58:10 LOG5[1:140128028927744]: Service [zookeeper-2181] accepted connection from 10.2.0.96:52022 2019.02.15 13:58:12 LOG5[1:140128028997376]: Service [zookeeper-2181] accepted connection from 10.2.3.86:51470 2019.02.15 13:58:20 LOG3[1:140128028858112]: connect_blocking: s_poll_wait 10.3.39.80:2181: TIMEOUTconnect exceeded 2019.02.15 13:58:20 LOG5[1:140128028858112]: Connection reset: 0 byte(s) sent to SSL, 0 byte(s) sent to socket 2019.02.15 13:58:20 LOG3[1:140128028927744]: connect_blocking: s_poll_wait 10.3.39.80:2181: TIMEOUTconnect exceeded 2019.02.15 13:58:20 LOG5[1:140128028927744]: Connection reset: 0 byte(s) sent to SSL, 0 byte(s) sent to socket 2019.02.15 13:58:21 LOG5[1:140128028927744]: Service [zookeeper-2181] accepted connection from 10.2.2.97:41380 2019.02.15 13:58:22 LOG3[1:140128028997376]: connect_blocking: s_poll_wait 10.3.39.80:2181: TIMEOUTconnect exceeded 2019.02.15 13:58:22 LOG5[1:140128028997376]: Connection reset: 0 byte(s) sent to SSL, 0 byte(s) sent to socket 2019.02.15 13:58:22 LOG5[1:140128028997376]: Service [zookeeper-2181] accepted connection from 10.2.0.96:52080 2019.02.15 13:58:23 LOG5[1:140128028858112]: Service [zookeeper-2181] accepted connection from 10.2.3.86:51502

scholzj commented 5 years ago

Right. Where are you running it? Kubernetes or OpenShift? Which version? I think when I wrote it it was on environment where I had disabled NetworkPolicies. If your network has NetworkPolicies enabled, we will need to add it.

scholzj commented 5 years ago

FYI: This should be the YAML:

apiVersion: extensions/v1beta1
kind: NetworkPolicy
metadata:
  labels:
    app: zoo-entrance
  name: zoo-entrance
spec:
  ingress:
  - from:
    - podSelector:
        matchLabels:
          app: zoo-entrance
    ports:
    - port: 2181
      protocol: TCP
  podSelector:
    matchLabels:
      strimzi.io/name: my-cluster-zookeeper
  policyTypes:
  - Ingress
alexismichelFreelance commented 5 years ago

Running on kubernetes, cloud (OVH cloud provider). I am also investigating something on their side at the moment, it may be their config which is at fault. I will try to see if this is a problem with network policies

alexismichelFreelance commented 5 years ago

Network policy works fine. I didn't think of looking at that. thank you

val1715 commented 5 years ago

Last day also met this problem. I am with you about security but in my case it would be better to use strimzi zookeeper for external resources (HDFS components).

But suggested solution with deploy proxy service does not fit my situation.

I tried to change config of tls-sidecar and use stunnel as transparent proxy and open another port for unencrypted traffic. But unfortunately it is not possible to open extra port for resource, which is managed by operator (which also breaks the main idea of k8s operators, i agree).

So thank you for open source code, i made some changes for sidecar containers to make it simple proxy without TLS.

Example:

Replace stunnel with nginx with less changes in common

FROM strimzi/zookeeper-stunnel:0.10.0

MAINTAINER ValeriiOlefir "val1715@gmail.com"

USER root
COPY stunnel_config_generator.sh stunnel_run.sh /opt/stunnel/
RUN chown root:root /opt/stunnel/* && chmod 775 /opt/stunnel/* \
    && yum update -y \
    && yum install epel-release -y \
    && yum install nginx -y

CMD ["/opt/stunnel/stunnel_run.sh"]

stunnel_config_generator.sh

...
    cat <<-EOF
    [listener-$port]
    client = no
    CAfile = ${CA_CERTS}
    cert = ${NODE_CERTS_KEYS}/${CURRENT}.crt
    key = ${NODE_CERTS_KEYS}/${CURRENT}.key
    accept = $port
    connect = 127.0.0.1:$CONNECTOR_PORT
    verify = 2

    EOF

    cat >> /etc/nginx/nginx.conf <<-EOF
    upstream listener-$port {
    server 127.0.0.1:$CONNECTOR_PORT;
    }
    server {
    listen $port;
    proxy_pass listener-$port;
    error_log /var/log/nginx/error_stream.log;
    }
    EOF
...

stunnel_run.sh

...
nginx -t
nginx

cat /etc/nginx/nginx.conf
sleep 5
netstat -ntl
tail -f /var/log/nginx/error_stream.log
# exec /usr/bin/stunnel /tmp/stunnel.conf

I don't start stunnel and leave run script to throw nginx logs.

And as result, i change images for tls-sidecar containers in operator deployment config:

            - name: STRIMZI_DEFAULT_TLS_SIDECAR_ZOOKEEPER_IMAGE
              # value: "strimzi/zookeeper-stunnel:0.10.0"
              value: "val1715/zookeeper-stunnel:0.10.0_v45"
            - name: STRIMZI_DEFAULT_TLS_SIDECAR_KAFKA_IMAGE
              # value: "strimzi/kafka-stunnel:0.10.0"
              value: "val1715/kafka-stunnel:0.10.0_v45"
            - name: STRIMZI_DEFAULT_TLS_SIDECAR_ENTITY_OPERATOR_IMAGE
              # value: "strimzi/entity-operator-stunnel:0.10.0"
              value: "val1715/entity-operator-stunnel:0.10.0_v45"

Also, anyone can get my images, mentioned in the box above and use|check them, or create own custom.

For simple solution: just replace used images in your file https://github.com/strimzi/strimzi-kafka-operator/blob/master/install/cluster-operator/050-Deployment-strimzi-cluster-operator.yaml as in last box.

f1yegor commented 4 years ago

This is intentional. We do not want third party applications use the Zookeeper because it could have negative impact on Kafka cluster availability and because Zookeeper is quite hard to secure. If you really need a workaround, you can use this deployment which can proxy Zookeeper (it expects your Kafka cluster to be named my-cluster - if you use different name you should change it in the fields where my-cluster is used). Afterwards you should be just able to connect to zoo-entrance:2181.

for the sake of implementation, could @scholzj provide dockerfile for scholzj/zoo-entrance-stunnel?

scholzj commented 4 years ago

FYI: The Zoo entrance now lives in https://github.com/scholzj/zoo-entrance (including the Docker file)

AmjadHussainSyed commented 4 years ago

@scholzj works perfectly :point_up:

rtsisyk commented 4 years ago

I need this feature too. Running two ZooKeeper instances for one project usually just doesn't make sense.

scholzj commented 4 years ago

@rtsisyk You can use the tool from the repo above. We do not plan to implement this in Strimzi. Also, be careful since Zookeeper will soon be removed from Kafka as Kafka moves to replace it with its own RAFT protocol implementation.

rtsisyk commented 4 years ago

You can use the tool from the repo above.

Yeah, I'm trying this workaround. Anyway, adding an extra layer to unwrap SSL doesn't look like a reasonable solution for me. Probably it would be better to add an option to disable SSL + network policy in Strimzi operator itself.

Also, be careful since Zookeeper will soon be removed from Kafka as Kafka moves to replace it with its own RAFT protocol implementation.

Good point. But this work is not finished yet and currently we have to deal both with Kafka and ZooKeeper.

Any good implementations of ZooKeeper operators you can suggest? What is about adding an option to use an external ZooKeeper for Strimzi? I'm trying to run ClickHouse, which needs both ZooKeeper and ClickHouse. Adding an extra ZooKeeper instance in this case doesn't make sense.

scholzj commented 4 years ago

As I said, I do not think we plan either of that anymore. Sorry. One of the reasons we never opened the Zookeeper was that we never really had an intention to support it beyond Kafka.

I do not have any experience with any other Zookeeper operators I'm afraid.

sweetib commented 4 years ago

I am trying to use zoo-entrance deployment for connecting to zookeeper but ending up getting below errors

2020.10.14 05:38:13 LOG3[1:139861579093760]: SSL_connect: 1408F10B: error:1408F10B:SSL routines:SSL3_GET_RECORD:wrong version number
2020.10.14 05:38:13 LOG5[1:139861579093760]: Connection reset: 0 byte(s) sent to SSL, 0 byte(s) sent to socket

Because of the above error am not able to bring up user operator successfully, errors in user operator

2020-10-14 05:40:20 INFO  ClientCnxn:1112 - Opening socket connection to server zoo-entrance/10.8.3.138:2181. Will not attempt to authenticate using SASL (unknown error)
2020-10-14 05:40:20 INFO  ClientCnxn:959 - Socket connection established, initiating session, client: /10.4.15.16:41740, server: zoo-entrance/10.8.3.138:2181
2020-10-14 05:40:20 WARN  ClientCnxn:1246 - Session 0x0 for server zoo-entrance/10.8.3.138:2181, unexpected error, closing socket connection and attempting reconnect
java.io.IOException: Connection reset by peer
    at sun.nio.ch.FileDispatcherImpl.read0(Native Method) ~[?:?]
    at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) ~[?:?]
    at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:276) ~[?:?]
    at sun.nio.ch.IOUtil.read(IOUtil.java:233) ~[?:?]
    at sun.nio.ch.IOUtil.read(IOUtil.java:223) ~[?:?]
    at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:358) ~[?:?]
    at org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:75) ~[org.apache.zookeeper.zookeeper-3.5.8.jar:3.5.8]
    at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:363) ~[org.apache.zookeeper.zookeeper-3.5.8.jar:3.5.8]
    at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1223) [org.apache.zookeeper.zookeeper-3.5.8.jar:3.5.8]

Can someone help me understand what I might be doing wrong here?

scholzj commented 4 years ago

Please make sure you use the latest version of the Zoo entrance. Also, keep in mind that it is not part of Strimzi, so any issues should be raised in the Zoo entrance repo and not in Strimzi.