microsoft / tyger

Remote signal processing.
https://microsoft.github.io/tyger/
MIT License
22 stars 7 forks source link

Support 1 million concurrent runs #112

Open johnstairs opened 4 months ago

johnstairs commented 4 months ago
### Tasks
- [ ] #29
- [ ] Allow tagging runs
- [ ] Allow listing runs by status, buffers, and tags
- [ ] https://github.com/microsoft/tyger/issues/113
- [ ] Improve run sweeper scalability
- [ ] Allow buffers to be (bulk) deleted
- [ ] https://github.com/microsoft/tyger/issues/114
- [ ] Display buffer status on `tyger buffer show`
- [ ] Record time that a run started running
- [ ] Expose endpoint to show counts of runs by status (filtered by tags)
- [ ] Handle deadlocking that can occur with dependent runs
- [ ] Use [gang scheduling](https://kubedl.io/docs/training/gangscheduling/) for distributed runs to avoid deadlocks
- [ ] Set limit of number of Kubernetes jobs
johnstairs commented 4 months ago

@hansenms @naegelejd: Using this to track the work we need to do.