linkedin / brooklin

An extensible distributed system for reliable nearline data streaming at scale
BSD 2-Clause "Simplified" License
922 stars 137 forks source link

brooklin can't connect to zk when starting up #970

Open zblcourage opened 1 year ago

zblcourage commented 1 year ago

Subject of the issue

Brooklin 5.x cannot connect to zk when starting, but version 4.x can connect.

Your environment

Steps to reproduce

1.starting serviec: ./bin/brooklin-server-start.sh config/server.properties

report an error:

[2023-11-01 18:25:30,609] INFO Initiating client connection, connectString=localhost:2181 sessionTimeout=30000 watcher=com.linkedin.datastream.common.zk.ZkClient@20322d26 (org.apache.zookeeper.ZooKeeper) [2023-11-01 18:25:30,611] INFO Setting -D jdk.tls.rejectClientInitiatedRenegotiation=true to disable client-initiated TLS renegotiation (org.apache.zookeeper.common.X509Util) [2023-11-01 18:25:30,614] INFO jute.maxbuffer value is 1048575 Bytes (org.apache.zookeeper.ClientCnxnSocket) [2023-11-01 18:25:30,618] INFO zookeeper.request.timeout value is 0. feature enabled=false (org.apache.zookeeper.ClientCnxn) [2023-11-01 18:25:30,622] INFO Opening socket connection to server localhost/127.0.0.1:2181. (org.apache.zookeeper.ClientCnxn) [2023-11-01 18:25:30,622] INFO SASL config status: Will not attempt to authenticate using SASL (unknown error) (org.apache.zookeeper.ClientCnxn) [2023-11-01 18:25:30,627] INFO Socket connection established, initiating session, client: /127.0.0.1:61573, server: localhost/127.0.0.1:2181 (org.apache.zookeeper.ClientCnxn) [2023-11-01 18:25:30,631] INFO Session establishment complete on server localhost/127.0.0.1:2181, session id = 0x100152bc54b0005, negotiated timeout = 30000 (org.apache.zookeeper.ClientCnxn) [2023-11-01 18:25:30,632] INFO zkclient 0, zookeeper state changed ( SyncConnected ) (org.apache.helix.zookeeper.zkclient.ZkClient) [2023-11-01 18:25:30,638] INFO zkclient 0, sycnOnNewSession with sessionID 100152bc54b0005 async return code: OK and proceeds (org.apache.helix.zookeeper.zkclient.ZkClient) [2023-11-01 18:25:30,638] INFO Pagination config zk.getChildren.pagination.disabled=false, method to be invoked: getAllChildrenPaginated (org.apache.helix.zookeeper.zkclient.ZkConnection) [2023-11-01 18:25:30,641] WARN Session 0x100152bc54b0005 for sever localhost/127.0.0.1:2181, Closing socket connection. Attempting reconnect except it is a SessionExpiredException. (org.apache.zookeeper.ClientCnxn) EndOfStreamException: Unable to read additional data from server sessionid 0x100152bc54b0005, likely server has closed socket at org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:77) at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1290) [2023-11-01 18:25:30,643] WARN Paginated getChildren is unimplemented in ZK server! Falling back to non-paginated getChildren (org.apache.helix.zookeeper.zkclient.ZkConnection) Exception in thread "main" org.apache.helix.zookeeper.zkclient.exception.ZkTimeoutException: Operation cannot be retried because of retry timeout (-1 milli seconds). Retry was caused by CONNECTIONLOSS at org.apache.helix.zookeeper.zkclient.ZkClient.retryUntilConnected(ZkClient.java:1700) at org.apache.helix.zookeeper.zkclient.ZkClient.getChildren(ZkClient.java:1037) at com.linkedin.datastream.common.zk.ZkClient.getChildren(ZkClient.java:96) at com.linkedin.datastream.server.CachedDatastreamReader.fetchAllDatastreamNamesFromZk(CachedDatastreamReader.java:190) at com.linkedin.datastream.server.CachedDatastreamReader.(CachedDatastreamReader.java:59) at com.linkedin.datastream.server.DatastreamServer.(DatastreamServer.java:159) at com.linkedin.datastream.server.DatastreamServer.main(DatastreamServer.java:441)

2.zk error log:

2023-11-01 18:25:30,639 [myid:] - WARN [RequestThrottler:ZooKeeperServer@1145] - Received packet at server of unknown type 71 2023-11-01 18:26:07,095 [myid:] - INFO [SessionTracker:ZooKeeperServer@610] - Expiring session 0x100152bc54b0005, timeout of 30000ms exceeded

Expected behaviour

brooklin and zk reported an error and could not connect

Actual behaviour

brooklin needs to be able to connect to zk normally

1arrow commented 1 year ago

Im facing the same issue...what is the recommended zookeeper version for brooklin?

1arrow commented 1 year ago

@thomaslaw what is the compatible zk version?

cloud-66 commented 10 months ago

the same. Tried many different zookeeper versions. brooklin 5.4.3 doesn't work Error in zookeeper log

- WARN  [RequestThrottler:o.a.z.s.ZooKeeperServer@1187] - Received packet at server of unknown type 71
thomaslaw commented 10 months ago

Can you try zookeeper 3.6.3?

cloud-66 commented 10 months ago

Tried . brooklin 5.4.3 doesn't work with zookeeper 3.6.3.

cypres commented 4 months ago

You need to use the linkedin fork of linkedin/zookeeper.

Here is a dockerfile for building it, if you want

# Use maven base
ARG MVN_VERSION=3.8.4
ARG JDK_VERSION=17
FROM maven:${MVN_VERSION}-openjdk-${JDK_VERSION}-slim AS MAVEN_TOOL_CHAIN_CACHE
WORKDIR /build

# Download LinkedIn's fork of Zookeeper
# Maven do not have wget, so use an image that does
RUN --mount=from=gcr.io/distroless/base-debian12:debug,src=/busybox/,dst=/busybox/ \
 ["/busybox/sh", "-c", "/busybox/wget -O - https://github.com/linkedin/zookeeper/archive/refs/tags/3.6.3-28.tar.gz | /busybox/tar zxf -"]

# Resolve deps and build as needed
RUN cd /build/zookeeper-* ./ && mvn package -DskipTests --no-transfer-progress

# Slim-ish image for running
FROM openjdk:${JDK_VERSION}-slim-bullseye
COPY --from=MAVEN_TOOL_CHAIN_CACHE /build/zookeeper-3.6.3-28 /app
ENTRYPOINT ["/app/bin/zkServer.sh"]
CMD ["start-foreground"]

It's just downloading the tar ball and then mvn package -DskipTests or mvn install -DskipTests