Closed mzmssg closed 5 years ago
Cmdline tools: Add three command in node_maintain.py
python node_maintain.py dedicated-vc get -m {master ip}
python node_maintain.py dedicated-vc add -m {master ip} -v {vc name} -n {dedicated nodes}
python node_maintain.py dedicated-vc remove -m {master ip} -v {vc name}
Rest api: Add dedicated and total resource field:
{
//capacity percentage this virtual cluster can use of entire cluster
"capacity":50,
//max capacity percentage this virtual cluster can use of entire cluster
"maxCapacity":100,
// used capacity percentage this virtual cluster can use of entire cluster
"usedCapacity":0,
"numActiveJobs":0,
"numJobs":0,
"numPendingJobs":0,
"resourcesUsed":{
"memory":0,
"vCores":0,
"GPUs":0
},
"resourcesTotal":{
"memory":0,
"vCores":0,
"GPUs":0
},
"dedicated": true/false,
// RUNNING: vc is enabled
// STOPPED: vc is disabled, without either new job or running job.
// DRAINING: intermedia state from RUNNING to STOPPED, in waiting on existing job.
"status":"RUNNING"/"STOPPED"/"DRAINING",
"nodeList": [node1, node2]
}
Webportal: add dedicated vc table, add total resource in existing columns, add bonus column:
Exporter & prometheus: Exporter will consider node label when calculating available resource
Goal: provide a feature to reserve nodes.
Solution: leverage yarn node-label to create exclusive vc.
Items:
Features might be impacted:
resourceTotal
.nodeList
.