Open scarlett2018 opened 5 years ago
These info can be get from launcher from :9086/v1/Frameworks
, if combined info from #2073 , we can calculated resource that wasted(the job finally failed/killed).
These info can be get from launcher from
:9086/v1/Frameworks
, if combined info from #2073 , we can calculated resource that wasted(the job finally failed/killed).
Would you like to merge #2073 with this item? what's the estimation to have both #2127 and #2073 in place? Let's combine them if it makes tracking easier.
No, I think #2073 is relatively easy to implement, but this requires more efforts. Let's track them in different issues.
There are some experiment for job dashboard done in PowerBI, it's time to revisit, whether these things work well for v1.x. And whether there are any new needs to better understand the overall job utilization.
Queuing time, Job Status Summary, Job completion w/o system error, Long running jobs completion rate, etc. User Usage Summary, VC Usage Summary.
By time, by status, by user
USTC Xinwei had shared sample scripts offline.
Retry should also be considered. For failure jobs, failure reasons should also reported for ops improvement and DRI.