issues
search
stanford-futuredata
/
gavel
Code for "Heterogenity-Aware Cluster Scheduling Policies for Deep Learning Workloads", which appeared at OSDI 2020
MIT License
125
stars
31
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Bump torch from 1.4.0 to 2.2.0 in /scheduler
#247
dependabot[bot]
opened
3 months ago
0
Replaced deprecated sklearn package
#246
tareqmahmood
opened
10 months ago
0
Question about the datasets
#245
Abida16
opened
1 year ago
0
The operation of shutil.rmtree
#244
JoeyYoung
closed
1 year ago
1
The availability of gavel AMI
#243
sunshineffmx
opened
2 years ago
2
Question: How can I get the spot prices aws/azure?
#242
xlcbingo1999
closed
1 year ago
1
Running scheduler_tests.py
#241
nayakajay
opened
2 years ago
3
Question: Understanding structure of throughputs.json files
#240
nayakajay
closed
2 years ago
4
code for SJT policy
#239
mozizhao
opened
2 years ago
1
Questions about the simulation
#238
Rivendile
closed
2 years ago
12
faster problem compilation
#237
akshayka
closed
3 years ago
3
Notebook analyzing policy scaling with number of jobs using sub-clusters
#236
deepakn94
closed
4 years ago
0
New notebook analyzing strategy-proof and non strategy-proof policies
#235
deepakn94
closed
4 years ago
0
Don't partition CDFs by job size (instead insert an inset CDF to zoom into tight portion of graph)
#234
deepakn94
closed
4 years ago
0
Fix how lease extension opportunities are measured in simulation
#233
santhnm2
closed
4 years ago
0
Add script to generate in-progress trace from scheduler log
#232
santhnm2
closed
4 years ago
2
Rename lease variables
#231
santhnm2
opened
4 years ago
0
Fix lease calculations
#230
santhnm2
closed
4 years ago
0
Prioritize locality for distributed jobs over lease extensions
#229
santhnm2
closed
4 years ago
0
Call done_callback for failed jobs; use psutil to find PIDs
#228
santhnm2
closed
4 years ago
0
Update lease timing to reflect current round deadlines
#227
santhnm2
closed
4 years ago
0
Discard stale updates, add barrier on lease expiration, fix reset time events
#226
santhnm2
closed
4 years ago
0
Updates to artifact evaluation instructions and trace
#225
santhnm2
closed
4 years ago
0
Move re-dispatches to the start of a round, and enforce a minimum round duration
#224
santhnm2
closed
4 years ago
0
Use monotonically increasing ports
#223
santhnm2
closed
4 years ago
0
Kill unresponsive jobs at the end of the round
#222
santhnm2
closed
4 years ago
2
Fix bug with computing elapsed time
#221
santhnm2
closed
4 years ago
0
Account for reset events when computing elapsed time
#220
santhnm2
closed
4 years ago
0
Record the frequency of lease extensions
#219
santhnm2
closed
4 years ago
1
New strategy-proof policy, and accompanying driver
#218
deepakn94
closed
4 years ago
0
Compute scheduler overhead for each job
#217
santhnm2
closed
4 years ago
0
Wrap load_checkpoint and save_checkpoint with GavelIterator
#216
santhnm2
closed
4 years ago
0
Shutdown logging stream handler on exit
#215
santhnm2
closed
4 years ago
0
Disable MPS by default
#214
santhnm2
closed
4 years ago
0
Bug fixes for running packed policies on the physical cluster
#213
santhnm2
closed
4 years ago
0
Bugfixes in `_done_callback` encountered when running cluster size sweep experiments
#212
deepakn94
closed
4 years ago
0
Update instructions for running AE trace
#211
santhnm2
closed
4 years ago
0
Fix data directory for Imagenet
#210
santhnm2
closed
4 years ago
0
Clean up interface for communicating with jobs through GavelIterator
#209
santhnm2
closed
4 years ago
0
Added trace for Artifact Evaluation
#208
santhnm2
closed
4 years ago
0
Use built-in Python logger
#207
santhnm2
closed
4 years ago
0
Fix algorithm for assigning workers to jobs
#206
santhnm2
closed
4 years ago
0
Miscellaneous bug fixes
#205
santhnm2
closed
4 years ago
0
Create PHYSICAL_CLUSTER.md
#204
santhnm2
closed
4 years ago
2
Switch to factored-out generate_job function
#203
santhnm2
closed
4 years ago
0
Add notebook for hierarchical policy experiment
#202
deepakn94
closed
4 years ago
0
Remove hard-coded workload directory from traces
#201
santhnm2
closed
4 years ago
0
Separate working directory from job command
#200
santhnm2
closed
4 years ago
1
Graph updates
#199
deepakn94
closed
4 years ago
0
Factor out job generation code
#198
santhnm2
closed
4 years ago
0
Next