Open ezorita opened 1 year ago
Clearly this issue is an inherited blocking mask. I have reviewed the code and there are two places in which a SIGINT
blocking mask is applied to the process:
I wonder whether the signal blocking is strictly necessary, since we can't assume all the code (user + libraries) used in the children process will unblock the signals before forking further. They might rely on these signals to work properly.
Hi, I'm a bot from the Ray team :)
To help human contributors to focus on more relevant issues, I will automatically add the stale label to issues that have had no activity for more than 4 months.
If there is no further activity in the 14 days, the issue will be closed!
You can always ask for help on our discussion forum or Ray's public slack channel.
What happened + What you expected to happen
It seems ray processes are not able to handle signalling to subprocesses properly. When a task or actor creates a subprocess, it is not able to communicate with it using
signal.SIGINT
. The script below reproduces the issue, it spins up a subprocesssleep 100
and then signals it to finish. The subprocess should terminate with bothSIGINT
andSIGKILL
, but under ray tasks/actors it only responds toSIGKILL
.I would expect process signalling to work normally.
Versions / Dependencies
ray 2.2.0 python 3.8.10
Reproduction script
Issue Severity
High: It blocks me from completing my task.