Kubernetes Discovery for Hazelcast
Unhelpful error when multiple pods/services found #277

Open jerrinot opened 3 years ago

jerrinot commented 3 years ago

This is how my cluster looks like:

14:24 $ kubectl get all
NAME                                          READY   STATUS    RESTARTS   AGE
pod/nfs-client-provisioner-7fc8dd7d88-j2pqb   1/1     Running   1          12d
pod/hazelcast-0                               1/1     Running   0          35m
pod/hazelcast-1                               1/1     Running   0          35m
pod/hazelcast-2                               1/1     Running   0          35m

NAME                  TYPE           CLUSTER-IP       EXTERNAL-IP   PORT(S)          AGE
service/kubernetes    ClusterIP     <none>        443/TCP          16d
service/hazelcast-0   LoadBalancer    5701:31289/TCP   34m
service/hazelcast-1   LoadBalancer    5701:30110/TCP   34m
service/hazelcast-2   LoadBalancer    5701:30426/TCP   34m

NAME                                     READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/nfs-client-provisioner   1/1     1            1           12d

NAME                                                DESIRED   CURRENT   READY   AGE
replicaset.apps/nfs-client-provisioner-7fc8dd7d88   1         1         1       12d

NAME                         READY   AGE
statefulset.apps/hazelcast   3/3     35m

I have 3 pods with Hazelcast running, each pod has a load balancer assigned. However I also have an unrelated deployment with NFS PV provisioner. The cluster is running fine, but a client fails to translate discovered PODs to public addresses because of this check

Perhaps it's OK to be conservative and skip the translation altogether. But it would be really helpful to print a hint - what's going on. Something like "I found public IP of 3 PODs, but failed to resolve this and that. It's possible the POD is not from Hazelcast and you should do this"

hasancelik commented 3 years ago

@jerrinot, makes sense, giving starting point to user would provide more smoother process.

leszko commented 3 years ago

Yeah, agree. I think the behavior is correct because you have somehow misconfigured the Hazelcast cluster. But yeah, we could give some info to the user.