skypilot-org / skypilot

SkyPilot: Run AI and batch jobs on any infra (Kubernetes or 12+ clouds). Get unified execution, cost savings, and high GPU availability via a simple interface.
https://skypilot.readthedocs.io
Apache License 2.0
6.68k stars 493 forks source link

[k8s] Add checks for `shuf` dependency #3009

Open romilbhardwaj opened 8 months ago

romilbhardwaj commented 8 months ago

shuf is used for avoiding concurrent socat connections (#2628). However, some environments may not have shuf installed:

Tailing logs of job 1 on cluster 'redacted'...
/Users/redacted/.sky/port-forward-proxy-cmd.sh: line 48: shuf: command not found
usage: sleep seconds
Warning: Permanently added '127.0.0.1' (ED25519) to the list of known hosts.
Warning: Permanently added '10.42.56.141' (ED25519) to the list of known hosts.
I 01-21 17:51:18 log_lib.py:431] Start streaming logs for job 1.
INFO: Tip: use Ctrl-C to exit log streaming (task will not be killed).

We should add dependency checks for shuf just like we have for nc and socat.

romilbhardwaj commented 8 months ago

Also, shuf may get installed as gshuf on some macos environments. Since shuf and gshuf are fungible, we should auto-detect and use whichever is available.

github-actions[bot] commented 4 months ago

This issue is stale because it has been open 120 days with no activity. Remove stale label or comment or this will be closed in 10 days.

dongreenberg commented 4 months ago

Bumping, ran into the shuf issue today (on Mac)

romilbhardwaj commented 4 months ago

Thanks @dongreenberg - our dependency on shuf will be removed after #3657.

github-actions[bot] commented 1 day ago

This issue is stale because it has been open 120 days with no activity. Remove stale label or comment or this will be closed in 10 days.