Open navidcy opened 4 years ago
Maybe a better idea is to use pbsnodes
to parse and get this number?
Whilst that information is available using pbsnodes
:
$ pbsnodes -F json gadi-cpu-clx-0470
{
"timestamp":1579661565,
"pbs_version":"19.2.4.20190830141245",
"pbs_server":"gadi-pbs-01",
"nodes":{
"gadi-cpu-clx-0470":{
"Mom":"gadi-cpu-clx-0470.gadi.nci.org.au",
"ntype":"PBS",
"state":"job-busy",
"pcpus":96,
"jobs":[
"1079829.gadi-pbs"
],
"resources_available":{
"arch":"linux",
"host":"gadi-cpu-clx-0470",
"jobfs":"429496729600b",
"mem":"213647360kb",
"ncpus":48,
"ngpus":0,
"topology":"rack-23-ib2,rack-23,group-1,cpu-clx",
"vmem":"213647360kb",
"vnode":"gadi-cpu-clx-0470"
},
"resources_assigned":{
"jobfs":"102400kb",
"mem":"41943040kb",
"ncpus":48
},
"comment":"offlined by hook 'begin_checknode' due to hook error",
"resv_enable":"True",
"sharing":"default_shared",
"license":"l",
"last_state_change_time":1579659651,
"last_used_time":1579659544
}
}
}
I've had a look but it isn't clear to me how we would know which node to query for that information a priori, i.e. a mapping of queue name to nodes.
I'm wondering about a system config file with some of this information, something like:
gadi.nci.org.au:
normal:
ncpus: 48
mem: 256GB
normalbw:
ncpus: 28
mem: 128GB
that could live somewhere in the {{payu}} directory. Maybe a {{platform}} directory?
And when the user selects a queue the rest is determined automatically by payu? This sounds nice.
Include also express
and expressbw
in that case.
(Btw, isn’t 190GB RAM per node that gadi has on normal)?
Yeah the idea is it would be automatic. I was just putting in a couple of examples to show the idea, we would have all the available queues in there if the idea was adopted. Similarly the numbers were just for illustration.
On raijin the default node size was, I believe, 16 and when we wanted to use
normalbw
queue we had to add inconfig.yaml
Now with gadi, should we change the default nodesize to 48?