skypilot-org / skypilot

SkyPilot: Run AI and batch jobs on any infra (Kubernetes or 12+ clouds). Get unified execution, cost savings, and high GPU availability via a simple interface.
https://skypilot.readthedocs.io
Apache License 2.0
6.82k stars 513 forks source link

[Jobs] Disable deduplication for logs #4388

Closed Michaelvll closed 1 day ago

Michaelvll commented 1 day ago

We previously have the RAY_DEDUP_LOGS set for the ray cluster, but it becomes not effective for the jobs as we got rid of ray job submit #4318 , making the job not inheriting the env var from the ray cluster. We now set those env vars directly to the driver process.

To reproduce:

sky launch --num-nodes 4 echo hi
...
(worker2, rank=2, pid=2043, ip=10.0.2.206) hi
(worker1, rank=1, pid=2038, ip=10.0.2.205) hi [repeated 3x across cluster] (Ray deduplicates logs by default. Set RAY_DEDUP_LOGS=0 to disable log deduplication, or see https://docs.ray.io/en/master/ray-observability/ray-logging.html#log-deduplication for more options.)
✓ Job finished (status: SUCCEEDED).

Tested (run the relevant ones):