vultr / slik

Slurm in Kubernetes
https://vultr.com
Apache License 2.0
39 stars 6 forks source link

sliks stuck in Pending State #17

Open brarj413 opened 5 months ago

brarj413 commented 5 months ago

full-slurmabler pods are up but nothing happens logs from slik operator :-

2024-07-04T08:23:01.645Z INFO slurm/create_slurmabler.go:102 github.com/vultr/slik/pkg/slurm.buildSlurmablerDaemonSet node lacking labels... {"host": ".........", "hostname": "slik-operator-7df995d79f-6m7b5", "pid": 1}

odellem commented 4 months ago

I have a similar issue. all the pods are running:

NAME                             READY   STATUS    RESTARTS   AGE
slik-operator-6d6c7fc44c-ws7vz   1/1     Running   0          101s
test-slurmabler-49nlp            1/1     Running   0          97s
test-slurmabler-4dpx6            1/1     Running   0          97s
test-slurmabler-4k8zg            1/1     Running   0          97s
test-slurmabler-6qb7s            1/1     Running   0          97s
test-slurmabler-s95th            1/1     Running   0          97s
test-slurmabler-xdwwv            1/1     Running   0          98s

However, the slurmabler pods are in error state (host removed by me): 2024-08-01T16:01:04.981Z INFO ./main.go:78 main.main sleeping forever... {"host": "REMOVED", "hostname": "test-slurmabler-4dpx6", "pid": 1}

So now the slik object is stuck in pending:

NAME   STATE     AGE
test   PENDING   7m12s
piersharding commented 2 weeks ago

full-slurmabler pods are up but nothing happens logs from slik operator :-

2024-07-04T08:23:01.645Z INFO slurm/create_slurmabler.go:102 github.com/vultr/slik/pkg/slurm.buildSlurmablerDaemonSet node lacking labels... {"host": ".........", "hostname": "slik-operator-7df995d79f-6m7b5", "pid": 1}

Same issue: 2024-11-14T13:35:41.400Z INFO slurm/create_slurmabler.go:102 github.com/vultr/slik/pkg/slurm.buildSlurmablerDaemonSet node lacking lab els... {"host": "10.10.35.20", "hostname": "slik-operator-666d5f4bf7-vq7nd", "pid": 1}

Nodes do get labels set, but it doesn't seem to move onto the next phase after that.