stanford-futuredata / gavel

Code for "Heterogenity-Aware Cluster Scheduling Policies for Deep Learning Workloads", which appeared at OSDI 2020
MIT License
125 stars 31 forks source link

Add script to generate in-progress trace from scheduler log #232

Closed santhnm2 closed 4 years ago

deepakn94 commented 4 years ago

Probably want a corresponding _open_trace method as well.

I think we also need to record some other state: such as job completion times. Might make sense to try to checkpoint all the main state (like priorities, job_completion_times, etc). Not sure though

santhnm2 commented 4 years ago

Probably want a corresponding _open_trace method as well.

I think we also need to record some other state: such as job completion times. Might make sense to try to checkpoint all the main state (like priorities, job_completion_times, etc). Not sure though

The trace can be passed in to the run_scheduler_with_trace.py script - though we would need to merge it with any jobs that have yet to be dispatched. Wrt the other state, I think the only thing we care about is the completion times of the jobs that have already finished, which we can get from the log before failure.