stackhpc / slurm-k8s-cluster

A Slurm cluster for Kubernetes
MIT License
35 stars 11 forks source link

Issue with slurmdbd communicating with mysql, and ssh login #40

Closed hicotton02 closed 2 weeks ago

hicotton02 commented 2 months ago

Not sure if this is being maintained, but I am attempting to install this on to my kubernetes cluster. I am really new to all of this, so bare with me.

NAME                        READY   STATUS    RESTARTS   AGE
login-7d49fd7fb8-ndhc8      1/1     Running   0          10h
mysql-5c684b9fcf-8ndtf      1/1     Running   0          10h
slurmctld-0                 1/1     Running   0          10h
slurmd-0                    1/1     Running   0          10h
slurmd-1                    1/1     Running   0          10h
slurmd-2                    1/1     Running   0          10h
slurmd-3                    1/1     Running   0          10h
slurmdbd-54c8d976b9-zmzb8   1/1     Running   0          9h

the dbd pod cant communicate with the mysql pod. also, I am missing some step to be able to ssh into the login pod. I was able to confirm that the mysql pod is up and responding to requests, and that the dbd pod can ping/resolve dns for the mysql pod. I do not see any logs from the mysql pod that the dbd pod is attempting to login.

if this is still being maintained, and you need more info, please let me know.

badreddine2 commented 2 months ago

Hello, I had the same issue before. Then there is two different solutions : 1- Deploy one pod which contains both of mysql and slurmdbd containers and use the pod ip on slurmdbd configuration 2- Create a ClusterIP service and expose the Mysql pod on it, then use the DNS of this service on slurmdbd.conf (make sure that you put the correct user/password your Mysql)

hicotton02 commented 2 weeks ago

I had a weird network issue where they were not communicating on the correct network. Was able to resolve.