Open ShaoFuLiu opened 2 years ago
Here is the website I found: https://wiki.fysik.dtu.dk/niflheim/Slurm_configuration
I just scancel some of pending job and the number of pending job decreases from 10000 to 8822. But it seems not work.
We solved the problem. The reason of this issue is aronton submit a job which continuing submit other jobs, so the total number of jobs are still over the job maximum value 10000. Then aronton cancel it, and now it works!
I found a error when I tried to submit jobs, the following picture is what I type and get:![image](https://user-images.githubusercontent.com/53428020/171426616-e54788ed-99aa-45aa-9eef-ca0810e615bc.png)
Then I found one of the probable causes of the error:![image](https://user-images.githubusercontent.com/53428020/171426505-48cc6ab8-ff2c-46f8-978b-d64eaaec38a8.png)
I checked the current total number of running and pending jobs, the sum of them is 9987, which is close to default maximum number 10000.![image](https://user-images.githubusercontent.com/53428020/171428488-c3840e0e-d34b-4358-b1cb-f15bb8df4755.png)
So I think maybe we should add a higher value of maximum job number in slurm.conf or just cancel some jobs.![image](https://user-images.githubusercontent.com/53428020/171426505-48cc6ab8-ff2c-46f8-978b-d64eaaec38a8.png)