issues
search
skypilot-org
/
skypilot
SkyPilot: Run AI and batch jobs on any infra (Kubernetes or 12+ clouds). Get unified execution, cost savings, and high GPU availability via a simple interface.
https://skypilot.readthedocs.io
Apache License 2.0
6.81k
stars
513
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
[Core] Speed up job scheduling speed on unmanaged jobs
#4295
Michaelvll
closed
6 days ago
0
[Jobs] Speed up the time for managed jobs to be scheduled
#4294
Michaelvll
closed
6 days ago
0
[Core] Cancel 1000 jobs can take 5-10 mins
#4293
Michaelvll
closed
6 days ago
0
Refactor: Consolidate log streaming logic into centralized `log_utils.follow_logs()`
#4292
andylizf
closed
1 week ago
2
Update Lambda Cloud regions
#4291
cbrownstein-lambda
closed
2 weeks ago
0
[Core] Make ssh connection more robust with custom proxy
#4290
Michaelvll
closed
1 week ago
0
make --fast robust against credential or wheel updates
#4289
cg505
opened
2 weeks ago
0
[RunPod] Fix assertion in ports query.
#4288
cblmemo
closed
2 weeks ago
0
[Core][Docker] Support docker login on RunPod.
#4287
cblmemo
opened
2 weeks ago
1
AssertionError after manually deleting runpod instance
#4286
alita-moore
closed
2 weeks ago
7
stuck at "STARTING" when launching with a custom image on runpod
#4285
alita-moore
opened
2 weeks ago
4
Support event based smoke test instead of sleep time based to reduce flaky test and faster test
#4284
zpoint
closed
1 day ago
1
[Jobs] Managed jobs database use WAL mode
#4283
Michaelvll
closed
2 weeks ago
0
Custom image "Failed to launch the sky serve replica cluster with error: RuntimeError: Failed to SSH to 213.181.111.2 after timeout 600s, with Error: ConnectionRefusedError: [Errno 111] Connection refused)"
#4282
alita-moore
opened
2 weeks ago
4
[AWS] Explicitly check credential and refresh if needed
#4281
Michaelvll
closed
1 week ago
1
[DAG] Add Edge-Based Data Flow Support
#4280
andylizf
closed
2 weeks ago
2
[DAG] Add DAG Visualization with Jupyter Support
#4279
andylizf
closed
2 weeks ago
2
Set minimum port number a Ray worker can listen on to 11002
#4278
cbrownstein-lambda
closed
2 weeks ago
0
Add Basic Visualization Support for DAGs
#4277
andylizf
closed
1 week ago
0
[K8s] list_pod_for_all_namespaces gives ApiException: (403) if the user doesn't have necessary permissions
#4276
hemildesai
opened
2 weeks ago
3
[AWS] Credential retry for rotation is not effective
#4275
Michaelvll
opened
2 weeks ago
0
Fix `stream_logs` Duplicate Job Handling and TypeError
#4274
andylizf
closed
2 weeks ago
1
Bug: `stream_logs` Fails Due to Incorrect Job ID Handling and Duplicate Job Names in Managed Jobs
#4273
andylizf
closed
2 weeks ago
0
Update comments pointing to Lambda's docs
#4272
cbrownstein-lambda
closed
2 weeks ago
0
[Admin Policy] Apply policy in CLI
#4271
Michaelvll
closed
1 week ago
0
[k8s] Fix check pod privileges
#4270
romilbhardwaj
closed
2 weeks ago
1
runpod docker credentials not working when using image_id from private repository
#4269
alita-moore
opened
2 weeks ago
5
[k8s] Jobs controller on stale context needs better error messages
#4268
romilbhardwaj
opened
2 weeks ago
2
[jobs] autodown managed job clusters
#4267
cg505
closed
2 weeks ago
0
[k8s] Add `lsof` to k8s base image
#4266
romilbhardwaj
closed
1 week ago
1
runpod 4090 spot not available
#4265
alita-moore
opened
2 weeks ago
4
[Core] Avoid PENDING job to be set to FAILED and speed up job scheduling
#4264
Michaelvll
closed
2 weeks ago
0
[Core] Submitting 1000 jobs to a cluster
#4263
Michaelvll
closed
2 weeks ago
1
[docs]: OCI key_file path clarrification
#4262
HysunHe
closed
2 weeks ago
2
[k8s] Add flag to disable ssh setup
#4261
romilbhardwaj
closed
2 weeks ago
2
[Core] Ray job refused to submit jobs in PENDING status
#4260
Michaelvll
opened
2 weeks ago
0
Linting updates
#4259
andylizf
opened
2 weeks ago
1
Add a pre commit config to help format before pushing
#4258
zpoint
opened
2 weeks ago
5
[Jobs] Allowing to specify intermediate bucket for file upload
#4257
zpoint
opened
2 weeks ago
3
Add Envoy as an alternative Sky Serve load balancer implementation
#4256
ejj
opened
2 weeks ago
0
Implement Automatic Bucket Creation and Data Transfer in `with_data` API
#4255
andylizf
opened
2 weeks ago
1
Implement `with_data` API for Edge-Based Data Flow in Task DAGs
#4254
andylizf
closed
6 days ago
0
[Dashboard] Add a simple status filter.
#4253
concretevitamin
closed
2 weeks ago
0
[AWS] Disable additional auto update services for ubuntu image with cloud-init
#4252
Michaelvll
closed
2 weeks ago
0
SSH Agent forwarding not working for `run` section
#4251
chris-aeviator
opened
2 weeks ago
1
Bug: `stream_logs_by_id` incorrectly handles task retry logic
#4250
andylizf
opened
2 weeks ago
4
Refactor `stream_logs_by_id` to extract single task monitoring logic
#4249
andylizf
opened
2 weeks ago
1
[Jobs] Limit number of concurrent jobs & launches.
#4248
cblmemo
opened
2 weeks ago
0
do not redirect stderr to /dev/null when submitting job
#4247
cg505
closed
2 weeks ago
2
Disable more potential unattended upgrade sources for AWS
#4246
yika-luo
closed
2 weeks ago
0
Previous
Next