skypilot-org / skypilot

SkyPilot: Run AI and batch jobs on any infra (Kubernetes or 12+ clouds). Get unified execution, cost savings, and high GPU availability via a simple interface.
https://skypilot.readthedocs.io
Apache License 2.0
6.82k stars 512 forks source link

[Core] Unblock user program for SIGINT #4355

Open Michaelvll opened 1 week ago

Michaelvll commented 1 week ago

We use ray to schedule subprocesses to run on different nodes and resources. We need to check if that can cause issue with the SIGINT signal. See: https://github.com/ray-project/ray/blob/510686f267be509d270dbaa9284e5b6193559a21/python/ray/dashboard/modules/job/job_supervisor.py#L181C1-L192C19

                preexec_fn=(
                    (
                        lambda: signal.pthread_sigmask(
                            signal.SIG_UNBLOCK, {signal.SIGINT}
                        )
                    )
                    if sys.platform != "win32"
                    and os.environ.get("RAY_JOB_STOP_SIGNAL") == "SIGINT"
                    else None
                ),

Version & Commit info: