Closed PostmanSpat closed 2 years ago
This seems to be a windows-related error (rather than Ray Tune)
cc @wuisawesome do you know who is currently responsible for windows builds?
cc @fcardoso75 do you have any tips about this?
I can reproduce it:
>>> import ray
d:\anyscale\ray\python\ray\autoscaler\_private\cli_logger.py:57: FutureWarning: Not all Ray CLI dependencies were found. In Ray 1.4+, the Ray CLI, autoscaler, and dashboard will only be usable via `pip install 'ray[default]'`. Please update your install command.
warnings.warn(
>>> ray.init(address="127.0.0.1:6379")
2021-06-14 17:35:18,662 INFO worker.py:733 -- Connecting to existing Ray cluster at address: 192.168.0.197:6379
Traceback (most recent call last):
File "C:\ProgramData\Miniconda3\lib\site-packages\redis\connection.py", line 559, in connect
sock = self._connect()
File "C:\ProgramData\Miniconda3\lib\site-packages\redis\connection.py", line 615, in _connect
raise err
File "C:\ProgramData\Miniconda3\lib\site-packages\redis\connection.py", line 603, in _connect
sock.connect(socket_address)
ConnectionRefusedError: [WinError 10061] No connection could be made because the target machine actively refused it
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "d:\anyscale\ray\python\ray\_private\client_mode_hook.py", line 62, in wrapper
return func(*args, **kwargs)
File "d:\anyscale\ray\python\ray\worker.py", line 837, in init
_global_node = ray.node.Node(
File "d:\anyscale\ray\python\ray\node.py", line 163, in __init__
session_name = _get_with_retry(redis_client, "session_name")
File "d:\anyscale\ray\python\ray\node.py", line 41, in _get_with_retry
result = redis_client.get(key)
File "C:\ProgramData\Miniconda3\lib\site-packages\redis\client.py", line 1606, in get
return self.execute_command('GET', name)
File "C:\ProgramData\Miniconda3\lib\site-packages\redis\client.py", line 898, in execute_command
conn = self.connection or pool.get_connection(command_name, **options)
File "C:\ProgramData\Miniconda3\lib\site-packages\redis\connection.py", line 1192, in get_connection
connection.connect()
File "C:\ProgramData\Miniconda3\lib\site-packages\redis\connection.py", line 563, in connect
raise ConnectionError(self._error_message(e))
redis.exceptions.ConnectionError: Error 10061 connecting to 192.168.0.197:6379. No connection could be made because the target machine actively refused it.
>>> ray.init(address="127.0.0.1")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "d:\anyscale\ray\python\ray\_private\client_mode_hook.py", line 62, in wrapper
return func(*args, **kwargs)
File "d:\anyscale\ray\python\ray\worker.py", line 725, in init
redis_address, _, _ = services.validate_redis_address(address)
File "d:\anyscale\ray\python\ray\_private\services.py", line 409, in validate_redis_address
raise ValueError("Malformed address. Expected '<host>:<port>'.")
ValueError: Malformed address. Expected '<host>:<port>'.
>>> ray.init()
2021-06-14 17:35:42,799 INFO services.py:1315 -- View the Ray dashboard at http://127.0.0.1:8265
{'node_ip_address': '192.168.0.197', 'raylet_ip_address': '192.168.0.197', 'redis_address': '192.168.0.197:6379', 'object_store_address': 'tcp://127.0.0.1:25331', 'raylet_socket_name': 'tcp://127.0.0.1:9691', 'webui_url': '127.0.0.1:8265', 'session_dir': 'C:\\Users\\Fabiano\\AppData\\Local\\Temp\\ray\\session_2021-06-14_17-35-37_775383_3708', 'metrics_export_port': 53209, 'node_id': '041af3962168a633269dea9b8922b0c0335adecf7821e88dda6f9e28'}
>>> (pid=None) d:\anyscale\ray\python\ray\autoscaler\_private\cli_logger.py:57: FutureWarning: Not all Ray CLI dependencies were found. In Ray 1.4+, the Ray CLI, autoscaler, and dashboard will only be usable via `pip install 'ray[default]'`. Please update your install command.
(pid=None) warnings.warn(
>>> (pid=None) d:\anyscale\ray\python\ray\autoscaler\_private\cli_logger.py:57: FutureWarning: Not all Ray CLI dependencies were found. In Ray 1.4+, the Ray CLI, autoscaler, and dashboard will only be usable via `pip install 'ray[default]'`. Please update your install command.
(pid=None) warnings.warn(
@ray.remote
... def f():
... return "Hello"
...
>>> ray.get(f.remote())
'Hello'
>>> quit()
Also:
>>> import ray
d:\anyscale\ray\python\ray\autoscaler\_private\cli_logger.py:57: FutureWarning: Not all Ray CLI dependencies were found. In Ray 1.4+, the Ray CLI, autoscaler, and dashboard will only be usable via `pip install 'ray[default]'`. Please update your install command.
warnings.warn(
>>> ray.init(address="192.168.0.197:6379")
2021-06-14 17:37:37,501 INFO worker.py:733 -- Connecting to existing Ray cluster at address: 192.168.0.197:6379
Traceback (most recent call last):
File "C:\ProgramData\Miniconda3\lib\site-packages\redis\connection.py", line 559, in connect
sock = self._connect()
File "C:\ProgramData\Miniconda3\lib\site-packages\redis\connection.py", line 615, in _connect
raise err
File "C:\ProgramData\Miniconda3\lib\site-packages\redis\connection.py", line 603, in _connect
sock.connect(socket_address)
ConnectionRefusedError: [WinError 10061] No connection could be made because the target machine actively refused it
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "d:\anyscale\ray\python\ray\_private\client_mode_hook.py", line 62, in wrapper
return func(*args, **kwargs)
File "d:\anyscale\ray\python\ray\worker.py", line 837, in init
_global_node = ray.node.Node(
File "d:\anyscale\ray\python\ray\node.py", line 163, in __init__
session_name = _get_with_retry(redis_client, "session_name")
File "d:\anyscale\ray\python\ray\node.py", line 41, in _get_with_retry
result = redis_client.get(key)
File "C:\ProgramData\Miniconda3\lib\site-packages\redis\client.py", line 1606, in get
return self.execute_command('GET', name)
File "C:\ProgramData\Miniconda3\lib\site-packages\redis\client.py", line 898, in execute_command
conn = self.connection or pool.get_connection(command_name, **options)
File "C:\ProgramData\Miniconda3\lib\site-packages\redis\connection.py", line 1192, in get_connection
connection.connect()
File "C:\ProgramData\Miniconda3\lib\site-packages\redis\connection.py", line 563, in connect
raise ConnectionError(self._error_message(e))
redis.exceptions.ConnectionError: Error 10061 connecting to 192.168.0.197:6379. No connection could be made because the target machine actively refused it.
>>> ray.init()
2021-06-14 17:46:40,334 INFO services.py:1315 -- View the Ray dashboard at http://127.0.0.1:8265
{'node_ip_address': '192.168.0.197', 'raylet_ip_address': '192.168.0.197', 'redis_address': '192.168.0.197:6379', 'object_store_address': 'tcp://127.0.0.1:25569', 'raylet_socket_name': 'tcp://127.0.0.1:22893', 'webui_url': '127.0.0.1:8265', 'session_dir': 'C:\\Users\\Fabiano\\AppData\\Local\\Temp\\ray\\session_2021-06-14_17-46-35_279617_15616', 'metrics_export_port': 32616, 'node_id': '690eede43a036c64480cf6be71e9f68d734055b20735c9b0397444cf'}
>>>
>>>
>>> @ray.remote
... def f():
... return "Hello"
...
>>> ray.get(f.remote())
'Hello'
>>> quit()
Hmm, ok. @PostmanSpat are you using an external redis?
Ray should be handling its own redis server.
@richardliaw No, I am running redis server on my local Windows system. I tried configuring ray to use localhost and 127.., when it tests the connection it works, but then it reverts the IP to 192... and fails.
Having the same issue, I don't know if it's the same problem but it seems weird that the IP address configuration isn't respected at least
I meet the same problem. None independent radis installed before so Ray was handling its own redis server.
@pcmoritz should we consider bumping the priority of this?
I'm assigning this to you @mwtian since you are touching this codepath as part of the GCS work. Let us know if you need help working on this / if it turns out to be windows specific :)
FWIW, I can reproduce this on linux with latest HEAD so I think the Windows label can be removed. I get empty error messages every ~20 seconds and then, when I hit ^C get a traceback.
>>> ray.init(address="127.0.0.1:6379")
2021-12-29 17:41:27,482 INFO worker.py:852 -- Connecting to existing Ray cluster at address: 10.0.0.19:6379
2021-12-29 17:41:47,509 ERROR node.py:1342 -- ERROR as
2021-12-29 17:42:09,537 ERROR node.py:1342 -- ERROR as
^CTraceback (most recent call last):
File "/home/matti/miniconda3/envs/ray_dev/lib/python3.9/site-packages/redis/connection.py", line 559, in connect
sock = self._connect()
File "/home/matti/miniconda3/envs/ray_dev/lib/python3.9/site-packages/redis/connection.py", line 615, in _connect
raise err
File "/home/matti/miniconda3/envs/ray_dev/lib/python3.9/site-packages/redis/connection.py", line 603, in _connect
sock.connect(socket_address)
ConnectionRefusedError: [Errno 111] Connection refused
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/matti/ray_dev/python/ray/node.py", line 502, in get_gcs_client
self._gcs_client = GcsClient(address=self.gcs_address)
File "/home/matti/ray_dev/python/ray/node.py", line 409, in gcs_address
return get_gcs_address_from_redis(redis)
File "/home/matti/ray_dev/python/ray/_private/gcs_utils.py", line 110, in get_gcs_address_from_redis
gcs_address = redis.get("GcsServerAddress")
File "/home/matti/miniconda3/envs/ray_dev/lib/python3.9/site-packages/redis/client.py", line 1606, in get
return self.execute_command('GET', name)
File "/home/matti/miniconda3/envs/ray_dev/lib/python3.9/site-packages/redis/client.py", line 898, in execute_command
conn = self.connection or pool.get_connection(command_name, **options)
File "/home/matti/miniconda3/envs/ray_dev/lib/python3.9/site-packages/redis/connection.py", line 1192, in get_connection
connection.connect()
File "/home/matti/miniconda3/envs/ray_dev/lib/python3.9/site-packages/redis/connection.py", line 563, in connect
raise ConnectionError(self._error_message(e))
redis.exceptions.ConnectionError: Error 111 connecting to 10.0.0.19:6379. Connection refused.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/matti/ray_dev/python/ray/_private/client_mode_hook.py", line 105, in wrapper
return func(*args, **kwargs)
File "/home/matti/ray_dev/python/ray/worker.py", line 954, in init
_global_node = ray.node.Node(
File "/home/matti/ray_dev/python/ray/node.py", line 165, in __init__
session_name = self._internal_kv_get_with_retry(
File "/home/matti/ray_dev/python/ray/node.py", line 1340, in _internal_kv_get_with_retry
result = self.get_gcs_client().internal_kv_get(key, namespace)
File "/home/matti/ray_dev/python/ray/node.py", line 505, in get_gcs_client
time.sleep(1)
KeyboardInterrupt
@mattip Just to confirm this is the same issue:
10.0.0.19:6379
. e.g. redis-cli -h 10.0.0.19 -p 6379
would also be rejected.It seems the change to disallow 127.0.01 as a valid address came from PR #1556, with the comment
The main issue was that localhost was getting resolved to the loopback ip, which wasn't very helpful since services are registered with their node ip. This fixes the address getter function to never return the loopback ip.
Perhaps there could be differentiation between situations where a user prefers that all services run on 127.0.0.1, or prefers localhost
, or does not have a preference.
Since redis is no longer the default message broker, can we close this?
@mattip can you see if the problem is still reproducible on Windows? Since ray.init()
may still connect to the local global control store
process, it is possible the problem still exists.
@mattip can you see if the problem is still reproducible on Windows? Since
ray.init()
may still connect to the localglobal control store
process, it is possible the problem still exists.
I can help answer this. Ray and ray-based modin works well now. Here is the env info. ray_success.yaml.txt
Good to see! Thanks for confirming @YuanfengZhang. Closing the issue.
What is the problem?
Trying to set up a basic environment to use TensorTrade (TensorFlow) and ray[tune], but I get the following error when trying to connect to redis calling ray.init:
ConnectionRefusedError: [WinError 10061] No connection could be made because the target machine actively refused it
Redis is set up, I've configured the password, and I can connect ok using redis-cli.
I've put a monitor on the redis server, and I can see that ray connects initially, but then the stops:
I tracked the code through and found this in services.py:
It seems that even though I pass in the IP of 127.0.0.1, this code converts it back to 192.168.20.13. It seems that it will connect on 127.. address ok, but not on 192.. address. Unfortunately, the system I am running is controlled by a group policy and I cannot turn off the Windows firewall completely. I can telnet to redis on 127.., but I can't telnet on 192.. When I installed redis it added firewall rules, but I think the group policy might still prevent it from opening on 192..
So I commented out these two lines of code from address_to_ip:
Then when I run, I get this error:
I'm assuming that it is something to do with the group policies in my system preventing me from enabling access on 192.., so I'm happy to do my testing with the two lines of code commented out to force the connection to use 127.. But it would be nice if I could just do that through configuration.
However, now with the "Could not read 'session_name'" error, I'm stuck. I don't know if it is related to the 127.. change, or something else.
I also tried taking out the 127.. address from ray.init(), but then I got this error:
Where did port 17091 come from?
I've been discussing this on the redis Discord channel, and they've helped me reach this far of the investigation. But now they suggested I log an issue here.
Ray version and other system information (Python version, TensorFlow version, OS): Python 3.8 Windows 10 x64 Everything else was fresh pip installs this week.
Reproduction (REQUIRED)
Please provide a short code snippet (less than 50 lines if possible) that can be copy-pasted to reproduce the issue. The snippet should have no external library dependencies (i.e., use fake or mock data / environments):
Full stack trace: