Open pmenstrom opened 8 months ago
This will be part of the new Slurm page, it can be published before the rest of the page or with the rest.
As both Delta and Hydro start to get more user load, jobs will start to take longer to run, and so we'll get more and more tickets that we in the Blue Waters team used to internally refer to as "why my job not run?" tickets.
So yeah, we'll definitely want to put this up so that we can refer to it. "I believe that NCSA has a policy not to reveal the specifics of the job scheduling algorithms so that the users don't try to game the system.". I think maybe "policy" is too strong of a word here, but broadly yes, we work really hard to not tell users the exact parameters of the job system. First, as you say, mildly knowledgable users would probably use that knowledge to game the job system. But in addition to that; it would be very difficult to keep that up to date. Scheduler parameters can sometimes be tweaked on a very short timescale to react to user job behavior (or lack therof).
Since I have feelings about scheduling and schedulers and stuff, I've added myself as an assignee on this. I will do my absolute best to really contribute to this documentation, and good intentions and all that, but I definitely want to keep track of this.
I think it was a policy a couple decades ago :-) I haven't dealt with fairshare since Moab.
Just having some text we can point users to when they feel their jobs are unfairly being delayed will be a big help. (Why my job not start)
May also want to include a description of reservations "scontrol show reservations" and node state "sinfo -N -p
Right, @pmenstrom , good point. Having either a link to a page with job system commands that tell you what your status is and why your job is likely waiting, or this being that page, would be good.
Also, again, either a link to a page with (or else being that page) recommendations on how to structure a job request so that it's more likely to run and give you what you want. I typed out a version of this in a ticket yesterday. Things like "only request resources you're actually likely to use. If your code runs between 4 and 5 hours, then request (say) 6 hours, not 24, because it will get scheduled sooner, other jobs will get scheduled sooner, and if your code malfunctions and runs away but doesn't end the job, you won't have wasted nearly as much allocation.
Which in turn might ask the question: how do I know how long my job runs? Well, let me link you to our page (or section of page) on benchmarking and scaling up my jobs.
Hmm. Ok, I may assign creating skeletons of those pages to myself.
Sounds good, let me know when you have it sketched out and I might throw a few more things into it. May end up with things that should be on separate, related pages... Definitely run into questions when there is a system reservation before a PM and long jobs aren't starting even though enough nodes are idle to run their job.
This has come up again in another user ticket. I will search Jira and see what answers we have given in the past.
@pmenstrom can you please add the number of the current ticket? I'll track it to help inform an initial writeup. Thank you!
It is an old campus cluster ticket. They were asking for a bump in job priority and also asking for fairshare details ICCPADM-4810
I'm going to work on getting the Slurm pages (including a fairshare discussion) user-presentable in the near term and we can continue to refine them once they're published.
The Campus Cluster user documentation currently only mentions that the secondary queue uses fairshare but Weddie thinks it is active on all of the queues. The delta documentation doesn't mention fairshare at all.
The SLURM “sshare -l” command shows output on CC, Nightingale, and Delta but I am not sure if all 3 systems use it or if SLURM just collects the data regardless.
A general description of fairshare would be a good candidate for cross cluster content. It should probably just be a generic discussion of how fairshare works in SLURM and maybe mention what the "sshare" command output means for the user. I believe that NCSA has a policy not to reveal the specifics of the job scheduling algorithms so that the users don't try to game the system.