hydro-project / cluster

Cluster management tools for the Hydro stack
Apache License 2.0
18 stars 34 forks source link

Setting up hydro cluster in local kubernetes cluster (using kubeadm) #29

Open mak-azad opened 3 years ago

mak-azad commented 3 years ago

I am trying to run this project (https://github.com/hydro-project/cluster/blob/master/docs/getting-started-aws.md) on my local cluster environment. I have configured my cluster using kubeadm locally on my lab machines( two) instead of using kops on AWS cloud.

My observation is if they can run in VMs on aws, it should also run in local machine k8s cluster. To meet up the app requirments, I have modified the yaml files to lower the limit so that it can fit into my cluster nodes. Another thing, I removed their aws and kops refereces in the code as I have already setup the cluster manually using kubeadm.

More details:

In the cloud setup, it has a cluster creating script to create a cluster and after the creation of the cluster, we get the URL of two AWS ELBs, which can be used to interact with two services provided by the cloud loadbalancer.

My problem: I have modified the (https://github.com/hydro-project/cluster/blob/master/hydro/cluster/create_cluster.py) script to remove the references to kops and aws respectively. In this local kubernetes cluster, I have used MetalLB(https://metallb.universe.tf/installation/) as a loadBalancer for the services. I created the hydro cluster with following command:$python3 -m hydro.cluster.create_cluster -m 2 -r 2 -f 2 -s 2 At the end of script, unlike in aws cloud deployment, instead of 2 aws ELB address links which the client can use to interface with the system services , we are getting two public IPs of the physical nodes xxx.xxx.80.72 and xxx.xxx.12.58 respectively for the two services, provided by the MetalLb loadbalancer (using pre-configured address pool).

As expected all the pods are in running state.

$ kubectl get all -o wide

NAME                        READY   STATUS    RESTARTS   AGE   IP               NODE   NOMINATED NODE   READINESS GATES

pod/benchmark-nodes-qccl6   4/4     Running   0          22h   xxx.xxx..58    srl1   <none>           <none>

pod/benchmark-nodes-s2rqj   4/4     Running   0          22h   xxx.xxx.80.72    srl2   <none>           <none>

pod/function-nodes-ct7jm    4/4     Running   17         22h   xxx.xxx.12.58    srl1   <none>           <none>

pod/function-nodes-d5r6w    4/4     Running   7          22h   xxx.xxx..80.72    srl2   <none>           <none>

pod/management-pod          1/1     Running   0          22h   192.168.120.66   srl1   <none>           <none>

pod/memory-nodes-7dhsv      1/1     Running   1          22h   xxx.xxx.80.72    srl2   <none>           <none>

pod/memory-nodes-v8s2c      1/1     Running   1          22h   xxx.xxx.12.58    srl1   <none>           <none>

pod/monitoring-pod          1/1     Running   1          22h   192.168.120.84   srl1   <none>           <none>

pod/routing-nodes-lc62q     1/1     Running   1          22h   xxx.xxx.80.72    srl2   <none>           <none>

pod/routing-nodes-xm8n2     1/1     Running   1          22h   xxx.xxx.12.58    srl1   <none>           <none>

pod/scheduler-nodes-495kj   1/1     Running   0          22h   xxx.xxx.80.72    srl2   <none>           <none>

pod/scheduler-nodes-pjb9w   1/1     Running   0          22h   xxx.xxx.12.58    srl1   <none>           <none>
$kubectl get svc -A

NAME                       TYPE           CLUSTER-IP      EXTERNAL-IP     PORT(S)                                                                                                    AGE   SELECTOR

service/function-service   LoadBalancer   10.108.79.97    xxx.xxx.12.58   5000:32427/TCP,5001:30516/TCP,5002:30830/TCP,5003:31430/TCP,5004:32448/TCP,5005:30177/TCP,5006:30892/TCP   22h   role=scheduler

service/kubernetes         ClusterIP      10.96.0.1       <none>          443/TCP                                                                                                    20d   <none>

service/routing-service    LoadBalancer   10.107.63.188   xxx.xxx.80.72   6450:31251/TCP,6451:31374/TCP,6452:30037/TCP,6453:32030/TCP                                                22h   role=routing

However, when I try to connect with the service from a client in the cluster, it is failing to connect to the services. The execution always going into the exception in the if-else condition at _connect method,(code given bellow)**

At this point can someone please give me some pointers on what might be the issue with connecting the services for this project in my bare-metal 2-node cluster?


def _connect(self):
       sckt = self.context.socket(zmq.REQ)
       sckt.setsockopt(zmq.RCVTIMEO, 1000)
       sckt.connect(self.service_addr % CONNECT_PORT)

       sckt.send_string('')

       try:
           result = sckt.recv_string()
           return result
       except zmq.ZMQError as e:
           if e.errno == zmq.EAGAIN:
               return None
           else:
               raise e
leehm00 commented 1 year ago

Hello, I am also trying to run anna cluster in my local environment, and I get many problems running it. So I wonder if you could share the modified create_cluster.py script. Thanks very much!