hazelcast / hazelcast-kubernetes

Kubernetes Discovery for Hazelcast
Apache License 2.0
174 stars 99 forks source link

Only one pod join the cluster #377

Open SamueleAlpino opened 2 years ago

SamueleAlpino commented 2 years ago

Hi, recently i tried to upgrade hazelcast from :

com.hazelcast hazelcast-spring 4.0.6 com.hazelcast hazelcast 4.0.6 com.hazelcast hazelcast-kubernetes 2.2.3

to:

com.hazelcast hazelcast-spring 5.1.1 com.hazelcast hazelcast 5.1.1

I have a kubernetes cluster with some applications with those dependencies, last time when i upgraded the application the only think that i did was to kill the pods with the old version and no problems happened. according the documentation i have a bean to configure the service dns: @Bean public Config hazelcastConfig(Environment environment) { Config config = new Config(); config.getNetworkConfig() .getJoin() .getMulticastConfig() .setEnabled(false);

    config.getNetworkConfig()
            .getJoin()
            .getKubernetesConfig()
            .setEnabled(true)
            .setProperty(KubernetesProperties.SERVICE_DNS.key(), serviceName);

    return config;
}

i have also : apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRole metadata: name: apps-role namespace: ${cluster_env} rules:

The problem is that one pod goes on 2/2 without problems , the other one continue to crash and print: java.util.concurrent.TimeoutException: JoinMastershipClaimOp failed to complete within 9999991351 NANOSECONDS. My claim to be master is rejected! Setting master address to null NOT sending master question to blacklisted endpoints ending master question to [...]:5701 Connection to: [...]:5701 streamId:-1 is already in progress am i missing any configurations?

kkrol89 commented 2 years ago

Same problem occurs for my embedded hazelcast setup based on hazelcast version 5.1.2. Application is deployed on k8s cluster with istio sidecar injection enabled.

First node starts without any problem.

When second node starts, connection between them is established: [5.1.2] Established socket connection between /10.2.12.114:40435 and /10.2.10.135:5701

But when newly started node raises question about master, it remains unresolved: Sending master question to [10.2.10.135]:5701

I can see couple of these messages in between: Connection to: [...]:5701 streamId:-1 is already in progress

Then, they disagree about the master node: [5.1.2] My claim to be master is rejected! Voting endpoints: [[10.2.10.135]:5701]

and I can see the same exception as the one raised by @SamueleAlpino: Caused by: java.util.concurrent.TimeoutException: JoinMastershipClaimOp failed to complete within 9999992250 NANOSECONDS.

@SamueleAlpino, have you found the reason? Any workaround?

calnighters commented 1 year ago

I've been seeing the exact same issue...

I am trying to run a Spring Boot service with embedded Hazelcast using kubernetes API discovery mode. This is running in a kubernetes cluster with an istio service mesh.

Debug logs look to show that it has gained a connection but it doesn't seem to then be able to connect.

This is working fine in a namespace without istio enabled, so maybe its something to do with sitting behind a proxy?

@kkrol89 @SamueleAlpino have you figured out any work arounds yet?

EnricoDamini commented 10 months ago

I've resolved by adding appProtocol in service definition like this:

apiVersion: v1
kind: Service
metadata:
   name: {{ printf "%s-%s" .Values.projectName "service-headless" }}
   labels:
      app: {{ printf "%s-%s" .Values.projectName "serviceapp" }}
spec:
   type: ClusterIP
   clusterIP: None
   publishNotReadyAddresses: true
   selector:
      app: {{ printf "%s-%s" .Values.projectName "app" }}
   ports:
    - name: "hazelcast"
      port: 5701
      protocol: TCP
      appProtocol: tcp

I found the solution in this thread https://github.com/hazelcast/hazelcast/issues/22256#issuecomment-1357440679