Open brarj413 opened 5 months ago
I have a similar issue. all the pods are running:
NAME READY STATUS RESTARTS AGE
slik-operator-6d6c7fc44c-ws7vz 1/1 Running 0 101s
test-slurmabler-49nlp 1/1 Running 0 97s
test-slurmabler-4dpx6 1/1 Running 0 97s
test-slurmabler-4k8zg 1/1 Running 0 97s
test-slurmabler-6qb7s 1/1 Running 0 97s
test-slurmabler-s95th 1/1 Running 0 97s
test-slurmabler-xdwwv 1/1 Running 0 98s
However, the slurmabler pods are in error state (host removed by me):
2024-08-01T16:01:04.981Z INFO ./main.go:78 main.main sleeping forever... {"host": "REMOVED", "hostname": "test-slurmabler-4dpx6", "pid": 1}
So now the slik object is stuck in pending:
NAME STATE AGE
test PENDING 7m12s
full-slurmabler pods are up but nothing happens logs from slik operator :-
2024-07-04T08:23:01.645Z INFO slurm/create_slurmabler.go:102 github.com/vultr/slik/pkg/slurm.buildSlurmablerDaemonSet node lacking labels... {"host": ".........", "hostname": "slik-operator-7df995d79f-6m7b5", "pid": 1}
Same issue:
2024-11-14T13:35:41.400Z INFO slurm/create_slurmabler.go:102 github.com/vultr/slik/pkg/slurm.buildSlurmablerDaemonSet node lacking lab els... {"host": "10.10.35.20", "hostname": "slik-operator-666d5f4bf7-vq7nd", "pid": 1}
Nodes do get labels set, but it doesn't seem to move onto the next phase after that.
full-slurmabler pods are up but nothing happens logs from slik operator :-
2024-07-04T08:23:01.645Z INFO slurm/create_slurmabler.go:102 github.com/vultr/slik/pkg/slurm.buildSlurmablerDaemonSet node lacking labels... {"host": ".........", "hostname": "slik-operator-7df995d79f-6m7b5", "pid": 1}