# example.py
import ray
@ray.remote
def foo():
print('hello')
if __name__ == '__main__':
ray.init()
handle = foo.remote()
ray.get(handle)
RAY_ENABLE_WINDOWS_OR_OSX_CLUSTER=1 ray start --head
RAY_ENABLE_WINDOWS_OR_OSX_CLUSTER=1 ray start --address='192.168.0.196:6379'
python example.py
Output:
24-11-08 13:54:19,817 INFO worker.py:1601 -- Connecting to existing Ray cluster at address: 192.168.0.196:6379...
2024-11-08 13:54:19,831 INFO worker.py:1777 -- Connected to Ray cluster. View the dashboard at http://127.0.0.1:8265
(foo pid=45881) hello
(foo pid=45881) hello
I mitigated this issue by calling this function after starting worker node. Of course, it has many downsides and it's not the way to go in long term.
def kill_redundant_log_monitors():
"""
Killing redundant log_monitor.py processes.
If multiple ray nodes are started on the same machine,
there will be multiple ray log_monitor.py processes
monitoring the same log dir. As a result, the logs
will be replicated multiple times and forwarded to driver.
See issue https://github.com/ray-project/ray/issues/10392
"""
import psutil
import subprocess
log_monitor_processes = []
for proc in psutil.process_iter(["name", "cmdline"]):
try:
cmdline = subprocess.list2cmdline(proc.cmdline())
except (psutil.AccessDenied, psutil.NoSuchProcess):
continue
is_log_monitor = "log_monitor.py" in cmdline
if is_log_monitor:
log_monitor_processes.append(proc)
if len(log_monitor_processes) > 1:
for proc in log_monitor_processes[1:]:
proc.kill()
What happened + What you expected to happen
I encountered this https://github.com/ray-project/ray/issues/10392 issue when I was experimenting with ray. This issue was closed due to the inability to provide a reproducible example.
Versions / Dependencies
ray[all] 2.38.0 MacOS
Reproduction script
Output: 24-11-08 13:54:19,817 INFO worker.py:1601 -- Connecting to existing Ray cluster at address: 192.168.0.196:6379... 2024-11-08 13:54:19,831 INFO worker.py:1777 -- Connected to Ray cluster. View the dashboard at http://127.0.0.1:8265 (foo pid=45881) hello (foo pid=45881) hello
Issue Severity
Low: It annoys or frustrates me.
A workaround is at: https://github.com/intel-analytics/BigDL-2.x/pull/2799/files
I mitigated this issue by calling this function after starting worker node. Of course, it has many downsides and it's not the way to go in long term.