issues
search
stanford-futuredata
/
gavel
Code for "Heterogenity-Aware Cluster Scheduling Policies for Deep Learning Workloads", which appeared at OSDI 2020
MIT License
124
stars
31
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Restrict CPUs assigned to GPUs with NUMA
#197
santhnm2
closed
4 years ago
0
Enable jobs to initialize as soon as previous instance completes
#196
santhnm2
closed
4 years ago
0
Piggyback initial lease on job initialization
#195
santhnm2
closed
4 years ago
0
Clean up metadata handling for distributed jobs
#194
santhnm2
closed
4 years ago
1
Start measuring execution time from when job is initialized
#193
santhnm2
closed
4 years ago
0
Schedule round completion events for jobs with extended leases
#192
santhnm2
closed
4 years ago
0
Account for elapsed time when computing priorities on physical cluster
#191
santhnm2
closed
4 years ago
0
Refactor _schedule_with_rounds and dispatch jobs early
#190
santhnm2
closed
4 years ago
0
Write GavelIterator steps to file instead of printing
#189
santhnm2
closed
4 years ago
0
Add heartbeats to allow scheduler to remotely kill failed jobs
#188
santhnm2
opened
4 years ago
0
Make GavelIterator write to a file instead of printing
#187
santhnm2
closed
4 years ago
0
Extend lease if job placement does not change
#186
santhnm2
closed
4 years ago
0
Try to keep jobs on the same worker if possible
#185
santhnm2
closed
4 years ago
0
Asynchronously compute allocation
#184
santhnm2
closed
4 years ago
0
Extend lease if placement for job has not changed
#183
santhnm2
closed
4 years ago
1
Move policy computation to separate thread
#182
santhnm2
closed
4 years ago
0
Clean up imports for scheduler.py
#181
santhnm2
opened
4 years ago
0
Add ThroughputEstimator class
#180
santhnm2
closed
4 years ago
0
Scale cluster size according to workload, and fix water-filling algorithm so that we can time hierarchical policy
#179
deepakn94
closed
4 years ago
1
Policy runtime sweep fixes
#178
santhnm2
closed
4 years ago
0
Refactor water-filling max-min fairness policy so that packing involves minimal code change
#177
deepakn94
closed
4 years ago
0
Policy runtime sweep fixes
#176
santhnm2
closed
4 years ago
0
Fixes and improvements for throughput estimation
#175
santhnm2
closed
4 years ago
0
Improvements to water filling and integration into scheduler
#174
deepakn94
closed
4 years ago
0
Add argument to output generated jobs to a trace
#173
santhnm2
closed
4 years ago
0
Fix bugs in water-filling algorithm
#172
deepakn94
closed
4 years ago
0
Convert job type throughputs code to use new format, optimize multi-gpu case for job-job throughputs
#171
santhnm2
closed
4 years ago
0
Bugfixes to water-filling algorithm to get hierarchical scheduling working
#170
deepakn94
closed
4 years ago
0
Factor out job generation code
#169
santhnm2
closed
4 years ago
0
Fixes for physical cluster experiments
#168
santhnm2
closed
4 years ago
4
Separate out allocation recomputation timescale from time reset timescale
#167
deepakn94
closed
4 years ago
0
Isolated policy fixes
#166
deepakn94
closed
4 years ago
0
Add simulator option to simulate allocations _exactly_ (instead of using round-based scheduler mechanism)
#165
deepakn94
closed
4 years ago
0
Max-min fairness policy using a water filling algorithm
#164
deepakn94
closed
4 years ago
0
ILP formulation to determine bottleneck jobs
#163
deepakn94
closed
4 years ago
0
Job distribution that seems to give cleaner results
#162
deepakn94
closed
4 years ago
0
Sample job durations and scale factors from Philly job distribution
#161
santhnm2
closed
4 years ago
0
Better job duration distribution
#160
deepakn94
closed
4 years ago
0
Kill jobs with ps -aux instead of nvidia-smi
#159
santhnm2
closed
4 years ago
0
Some fixes to Gandiva policy
#158
deepakn94
closed
4 years ago
0
Shuffle the worker order for non-perf policies
#157
santhnm2
closed
4 years ago
0
Remove throughput scaling
#156
santhnm2
closed
4 years ago
0
Bugfix to max-min fairness policy
#155
deepakn94
closed
4 years ago
0
Miscellaneous bug fixes
#154
santhnm2
closed
4 years ago
0
Make Scheduler and Profiler subclasses of SchedulerMechanism
#153
santhnm2
opened
4 years ago
0
Support new throughputs file format
#152
santhnm2
closed
4 years ago
1
Bugfixes to FIFO policy: don't allocate jobs to workers where throughput is 0
#151
deepakn94
closed
4 years ago
0
Better allocation of workers to jobs to prevent fragementation
#150
deepakn94
closed
4 years ago
1
Gandiva policy
#149
deepakn94
closed
4 years ago
1
Distributed support for language modeling
#148
deepakn94
closed
4 years ago
0
Previous
Next