SkyPilot: Run AI and batch jobs on any infra (Kubernetes or 12+ clouds). Get unified execution, cost savings, and high GPU availability via a simple interface.
Initially, a process pool was used in server.controller to handle replica launching and termination due to limitations with redirecting sys.stdout and sys.stderr in multi-threaded environments. The global nature of sys.stdout caused output conflicts when multiple threads attempted to redirect it simultaneously.
Proposal
With a new approach to manage thread-local output, it's now feasible to use a thread pool. Each thread can independently handle its logging using thread-local storage, avoiding the previous issues of conflicting sys.stdout redirection.
Benefits
Reduced Latency: Threads are lighter than processes, leading to faster task execution with lower overhead.
Simplified Race Condition Handling: Threads can share memory more efficiently, reducing the need for complex inter-process communication.
Considerations
Thread-Local Overhead: While thread-local storage resolves the original issue, we need to test for potential performance impacts in highly concurrent environments.
Feedback is welcome, especially on testing plans and potential edge cases.
If there are multiple threads being executed at the same time, they will manipulate the same `system.stdout`.
I found a way to redirect `stdout` for individual threads, and you're absolutely right — `stdout` is indeed global. This means that if each thread tries to modify the `stdout` reference, it won't work because they all end up modifying the same global reference.
The solution I came up with keeps the same `stdout` reference, but uses a custom object to track which thread is responsible for the output. This allows each thread to write to its own output stream without affecting others.
Background
Initially, a process pool was used in
server.controller
to handle replica launching and termination due to limitations with redirectingsys.stdout
andsys.stderr
in multi-threaded environments. The global nature ofsys.stdout
caused output conflicts when multiple threads attempted to redirect it simultaneously.Proposal
With a new approach to manage thread-local output, it's now feasible to use a thread pool. Each thread can independently handle its logging using thread-local storage, avoiding the previous issues of conflicting
sys.stdout
redirection.Benefits
Considerations
Feedback is welcome, especially on testing plans and potential edge cases.
Originally posted by @andylizf in https://github.com/skypilot-org/skypilot/issues/4128#issuecomment-2427961898