skypilot-org / skypilot

SkyPilot: Run AI and batch jobs on any infra (Kubernetes or 12+ clouds). Get unified execution, cost savings, and high GPU availability via a simple interface.
https://skypilot.readthedocs.io
Apache License 2.0
6.81k stars 513 forks source link

[k8s] Remove `lsof` dependence for tailing logs #4304

Closed romilbhardwaj closed 1 week ago

romilbhardwaj commented 1 week ago

Many images (e.g., ubuntu, nemo) don't have lsof installed by default, which would cause our log tailing to k8s container output to fail.

This PR removes the dependence on lsof by using our own minimal version of lsof.

Tested:

In both cases verified logs are written to container stdout and can be read with kubectl logs -f. Also made sure the tail process is terminated after the task completes.