reanahub / reana-job-controller

REANA Job Controller
http://reana-job-controller.readthedocs.io/
MIT License
2 stars 38 forks source link

slurmcern: job failures #272

Closed tiborsimko closed 4 years ago

tiborsimko commented 4 years ago

The slurmcern compute backend is not working after refactoring...The jobs rapidly fails with numerous lines like:

$ kubectl logs reana-run-batch-f29bf59b-7abd-4367-8aa8-5bf244da9417-c5fml job-controller
...
2020-09-04 17:02:05,565 | root | slurm_job_monitor | ERROR | No slurm jobs
2020-09-04 17:02:05,565 | root | slurm_job_monitor | ERROR | No slurm jobs
2020-09-04 17:02:05,565 | root | slurm_job_monitor | ERROR | No slurm jobs
2020-09-04 17:02:05,565 | root | slurm_job_monitor | ERROR | No slurm jobs
2020-09-04 17:02:05,565 | root | slurm_job_monitor | ERROR | No slurm jobs
2020-09-04 17:02:05,565 | root | slurm_job_monitor | ERROR | No slurm jobs
2020-09-04 17:02:05,565 | root | slurm_job_monitor | ERROR | No slurm jobs
2020-09-04 17:02:05,565 | root | slurm_job_monitor | ERROR | No slurm jobs
2020-09-04 17:02:05,565 | root | slurm_job_monitor | ERROR | No slurm jobs
2020-09-04 17:02:05,565 | root | slurm_job_monitor | ERROR | No slurm jobs
2020-09-04 17:02:05,565 | root | slurm_job_monitor | ERROR | No slurm jobs
2020-09-04 17:02:05,565 | root | slurm_job_monitor | ERROR | No slurm jobs
2020-09-04 17:02:05,565 | root | slurm_job_monitor | ERROR | No slurm jobs
2020-09-04 17:02:05,566 | root | slurm_job_monitor | ERROR | No slurm jobs
2020-09-04 17:02:05,566 | root | slurm_job_monitor | ERROR | No slurm jobs
2020-09-04 17:02:05,566 | root | slurm_job_monitor | ERROR | No slurm jobs
2020-09-04 17:02:05,566 | root | slurm_job_monitor | ERROR | No slurm jobs
2020-09-04 17:02:05,566 | root | slurm_job_monitor | ERROR | No slurm jobs
2020-09-04 17:02:05,566 | root | slurm_job_monitor | ERROR | No slurm jobs
tiborsimko commented 4 years ago

Appeared perhaps after refactoring monitoring, after https://github.com/reanahub/reana-job-controller/issues/249?

tiborsimko commented 4 years ago

Merged into maint-0.7.