issues
search
skypilot-org
/
skypilot
SkyPilot: Run AI and batch jobs on any infra (Kubernetes or 12+ clouds). Get unified execution, cost savings, and high GPU availability via a simple interface.
https://skypilot.readthedocs.io
Apache License 2.0
6.82k
stars
513
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Doesn't use right GCP config path on Windows
#4344
alexkreidler
opened
1 week ago
0
[k8s] Leaked kubectl port-forward processes
#4343
romilbhardwaj
opened
1 week ago
0
[Docs] Add a concept page.
#4342
concretevitamin
closed
1 week ago
0
[perf] optimizations for sky jobs launch
#4341
cg505
closed
6 days ago
0
[timeline] disable trace collection if SKYPILOT_TIMELINE_FILE_PATH is not set
#4340
cg505
opened
1 week ago
0
[Docs] Use `--fast` for job submission in tutorials
#4339
Michaelvll
opened
1 week ago
2
[OCI] Enable SkyServe for OCI
#4338
HysunHe
closed
6 days ago
8
[k8s] support to use custom gpu resource name if it's not nvidia.com/gpu
#4337
nkwangleiGIT
opened
1 week ago
1
[UX] user-friendly message shown if Kubernetes is not enabled.
#4336
zpoint
closed
1 day ago
0
[smoke] if --generic-cloud is set, force enable that cloud
#4335
cg505
closed
1 week ago
0
[Core] Importing `sky` and `sky.status(refresh=True)` takes about 65MB / 200MB memory
#4334
Michaelvll
opened
1 week ago
3
[Serve] Update log pattern in `_follow_replica_logs` for new UX 3.0
#4333
andylizf
closed
1 week ago
2
[ux] cache cluster status of autostop or spot clusters for 2s
#4332
cg505
closed
6 days ago
0
improve tracing reporting and coverage
#4331
cg505
closed
1 week ago
0
fix broken links when read the docs
#4330
nkwangleiGIT
closed
1 week ago
0
[Serve] Temporary failure: infinite retry on GCP `compute.images.useReadOnly` permission error
#4329
andylizf
opened
1 week ago
0
[fast] if cluster is INIT, force refresh before deciding to provision
#4328
cg505
closed
6 days ago
0
[Bug] Smoke tests `--generic-cloud` flag is ignored when specified cloud is not in `default_clouds_to_run`
#4327
andylizf
closed
1 week ago
3
Add hourly price and instance type to env SKYPILOT_CLUSTER_INFO
#4326
tylerweitzman
opened
1 week ago
0
[Tests] Fix smoke tests for new job creation log format
#4325
andylizf
closed
1 week ago
0
[Kubernates] Not user-friendly message shown if Kubernates is not enabled.
#4324
HysunHe
closed
6 days ago
2
Refactor: Consolidate log streaming logic into centralized `log_utils.follow_logs()`
#4323
andylizf
closed
1 week ago
1
[Catalog] fix GCP catalog missing SKUs
#4322
cblmemo
closed
1 week ago
0
[Jobs] Fast jobs cancellation for PENDING managed jobs
#4321
Michaelvll
opened
1 week ago
0
[DAG] Integrate Data Storage Buckets for Data-Bearing Edges in Optimization
#4320
euclidgame
closed
6 days ago
4
[WIP] Advanced DAG Workflow.
#4319
cblmemo
opened
1 week ago
0
[Core] Replace ray job submit for 3x/8.5x faster job scheduling for cluster/managed jobs
#4318
Michaelvll
closed
6 days ago
10
[Storage] Call `sync_file_mounts` when either rsync or storage file_mounts are specified
#4317
romilbhardwaj
opened
1 week ago
0
[docs][azure] Update config doc for azure resource group specification
#4316
landscapepainter
closed
1 week ago
0
[Storage] set_storage_mounts not working in python API
#4315
romilbhardwaj
opened
1 week ago
0
[feature] the ability to recover skypilot data or commit to git
#4313
alita-moore
opened
1 week ago
0
[feature] better handling of failed rollouts
#4312
alita-moore
opened
1 week ago
2
[Core] Allow more PENDING jobs to be scheduled concurrently (1.4x faster)
#4311
Michaelvll
opened
1 week ago
1
[Core] Avoid job scheduling race condition
#4310
Michaelvll
closed
1 week ago
5
[DAG] Update Diamond Example For New Tentative Data API
#4309
andylizf
closed
1 week ago
1
Flaky test: `test_optimizer_dryruns.py` occasionally fails
#4308
andylizf
closed
1 week ago
1
[Core] Add `NO_UPLOAD` for `remote_identity`
#4307
romilbhardwaj
closed
5 days ago
0
Custom benchmark for inference
#4306
tylerweitzman
opened
1 week ago
1
[AWS] SSH issue when a large number of nodes are used in a cluster
#4305
Michaelvll
opened
1 week ago
0
[k8s] Remove `lsof` dependence for tailing logs
#4304
romilbhardwaj
closed
1 week ago
0
Fix AWS Route Table caching which causes invalid failures in other regions after an initial valid failure.
#4303
sfrolich
closed
1 week ago
1
[Test] Fix unittest for region infer
#4302
Michaelvll
closed
1 week ago
0
sky serve update doesn't roll out updated service unless the yaml config changes
#4301
alita-moore
opened
1 week ago
2
[UX] Unnecessary logs from ray
#4300
Michaelvll
opened
1 week ago
1
[Jobs] Jobs launch --fast does not start the dashboard
#4299
Michaelvll
opened
1 week ago
1
Replace `len()` Zero Checks with Pythonic Empty Sequence Checks
#4298
andylizf
opened
1 week ago
5
[k8s] Parallelize multi-node setup
#4297
romilbhardwaj
closed
1 week ago
0
[Jobs] Cancelling managed jobs can take a long time
#4296
Michaelvll
opened
1 week ago
0
[Core] Speed up job scheduling speed on unmanaged jobs
#4295
Michaelvll
closed
6 days ago
0
[Jobs] Speed up the time for managed jobs to be scheduled
#4294
Michaelvll
closed
6 days ago
0
Previous
Next