issues
search
skypilot-org
/
skypilot
SkyPilot: Run AI and batch jobs on any infra (Kubernetes or 12+ clouds). Get unified execution, cost savings, and high GPU availability via a simple interface.
https://skypilot.readthedocs.io
Apache License 2.0
6.82k
stars
513
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
[Jobs] A way to keep the managed job for a while after user program failure
#4245
Michaelvll
opened
2 weeks ago
1
[AWS/Azure] Avoid error out during image size check
#4244
Michaelvll
closed
2 weeks ago
0
[Jobs] Managed job controller process taking too much memory during peak time
#4243
Michaelvll
opened
2 weeks ago
1
[k8s] pod resource limit
#4242
bgyoon
closed
1 week ago
0
[UX] Support --tail parameter for sky logs
#4241
zpoint
closed
1 week ago
3
[k8s] Parallelize setup for faster multi-node provisioning
#4240
romilbhardwaj
closed
2 weeks ago
2
[Storage] Avoid opt-in regions for S3
#4239
romilbhardwaj
closed
3 weeks ago
0
[tests] Exclude runpod from smoke tests unless specified
#4238
romilbhardwaj
closed
2 weeks ago
0
[Core] dpkg lock showing up with AWS custom ubuntu image
#4237
Michaelvll
closed
2 weeks ago
1
[Tests] `test_tpu_vm_pod` failing on master
#4236
romilbhardwaj
opened
3 weeks ago
0
cannot run `sky jobs logs -n <job_name>` on SUCCEEDED job
#4235
cg505
opened
3 weeks ago
0
[Managed Jobs] Reduce the resource requirement for the controller process for more parallel jobs
#4234
Michaelvll
opened
3 weeks ago
0
[k8s] Prevent mounting of /dev/shm in pods
#4233
roclark
opened
3 weeks ago
1
[Core/UX] Improve the display of returncode for multi-node
#4232
Michaelvll
opened
3 weeks ago
0
[ux] add sky jobs launch --fast
#4231
cg505
closed
3 weeks ago
1
[UX] Show 0.25 on controller queue
#4230
Michaelvll
closed
3 weeks ago
2
[k8s] Parallelize pod initialization steps
#4229
romilbhardwaj
opened
3 weeks ago
0
[Release] Release 0.7.0
#4228
romilbhardwaj
closed
2 weeks ago
2
[Core] Make home address replacement more robust
#4227
Michaelvll
closed
3 weeks ago
0
[k8s] Skip SSH setup for faster provisioning
#4225
romilbhardwaj
opened
3 weeks ago
0
Update K8s docker image build and the source artifact registry
#4224
yika-luo
closed
2 weeks ago
1
fix docstring for write_cluster_config
#4223
cg505
closed
2 weeks ago
0
[UX] `sky logs` should be able to tail the last lines of the logs instead of showing all logs
#4222
Michaelvll
closed
1 week ago
1
[Docs] Tpu v6 docs
#4221
Michaelvll
closed
3 weeks ago
0
[Core] Support TPU v6
#4220
cblmemo
closed
3 weeks ago
0
Add user toolkits to all sky custom images and fix PyTorch issue on A10
#4219
yika-luo
closed
3 weeks ago
0
[Catalog] Add TPU V6e.
#4218
cblmemo
closed
3 weeks ago
1
[test] smoke test fixes for managed jobs
#4217
cg505
closed
3 weeks ago
0
[Tests] Fix public bucket tests
#4216
romilbhardwaj
closed
3 weeks ago
2
[TPU] TPU v6 support
#4215
Michaelvll
closed
3 weeks ago
1
[Tests] Add test for `max_restarts_on_errors`
#4214
Michaelvll
opened
3 weeks ago
0
[Jobs] Fix jobs name
#4213
Michaelvll
closed
3 weeks ago
1
Mitigating the Impact of Pylint's Inherent Limitations on Functionality of `format.sh`
#4212
root-hbx
opened
3 weeks ago
13
[Tests] Managed Jobs smoke test failed on latest master
#4211
cblmemo
closed
3 weeks ago
0
[UI] Ads on the SkyPilot documentation page
#4210
MaoZiming
opened
3 weeks ago
1
[Core] Fix issue with the wrong path of setup logs
#4209
Michaelvll
closed
3 weeks ago
0
[k8s] Fix show-gpus when limited permissions are available
#4208
romilbhardwaj
closed
1 week ago
4
[Jobs] Support syncing down logs for `sky jobs logs`
#4207
euclidgame
opened
3 weeks ago
0
[k8s] Add validation for `pod_config`
#4206
romilbhardwaj
opened
3 weeks ago
0
[Performance] Speed up Azure A10 instance creation
#4205
yika-luo
closed
3 weeks ago
1
Upgrade Azure SDK version requirement
#4204
yika-luo
closed
3 weeks ago
0
Update packer scripts
#4203
yika-luo
closed
3 weeks ago
0
[Azure] Update azure dependencies in setup.py
#4202
romilbhardwaj
closed
3 weeks ago
1
[serve] fix aws s3 sync in other regions
#4201
cg505
closed
3 weeks ago
1
[ux] re-provision cluster if --fast but skypilot wheel is outdated
#4200
cg505
closed
2 weeks ago
0
[UX] Better logging when user program OOM'ed
#4199
Michaelvll
closed
2 weeks ago
1
[UX] Improve Formatting of Post Job Creation Logs
#4198
andylizf
closed
2 weeks ago
4
Skypilot only wants to spawn 4 core cpu controller when sky serve up
#4197
mainey
opened
3 weeks ago
3
Remove outdated pylint disabling comments
#4196
andylizf
closed
3 weeks ago
1
[Jobs DAG] Flexible DAG Workflow Job Cancellation Policy
#4195
andylizf
opened
3 weeks ago
3
Previous
Next