Closed mehrdadn closed 3 years ago
FileNotFoundError: [Errno 2] Dashboard build directory not found. If installing from source, please follow the additional steps required to build the dashboard(cd python/ray/dashboard/client && npm ci && npm run build): 'C:\Users\crist\anaconda3\lib\site-packages\ray\dashboard\client/build'
Help!
@cristiangofiar thanks for opening this issue! That should be non-fatal; we should reduce the severity of that error.
cc @mfitton maybe let's just log an "info" message rather than Error or Warning.
@richardliaw But this failure can affect the execution of the program? I need to use Ray for an integrative job on a college subject! Also, the bug bothers! You can help?
Runing exeriment with HyperOptSearch and LightGBM, and receive rror message.
raise TuneError("Trials did not complete", incomplete_trials)
OSError: [WinError 123] The filename, directory name, or volume label syntax is incorrect: 'C:\Users\User\ray_results\train_flat_price\train_flat_price_1_bagging_fraction=tune.sample_from(<function uniform.
@cristiangofiar As a short term fix you could disable the dashboard by using the argument --include-webui=False
at the command line or include_webui=False
in the call to ray.init()
in your python code depending how you start it up. (Note this argument is being changed to --include-dashboard and include_dashboard respectively, but I don't know what version you're using.)
There are other issues with the Dashboard on Windows still that are still being fixed. Currently, even if you get the dashboard to start, it won't render anything. That said, this will not affect the running of your script.
@mfitton Thanks you very much! I have fixed it thanks to you! :D
OSError: [WinError 123] The filename, directory name, or volume label syntax is incorrect: 'C:\Users\User\ray_results\train_flat_price\train_flat_price_1_bagging_fraction=tune.sample_from(<function uniform.. at 0x000001C05ECACE58>),feature_fraction=_2020-08-20_18-00-270k00linp'
@valentasgruzauskas can you post a longer stacktrace?
The fatal error was a problem from my side, I used tune to generate input data, but used a search algorithm. Now I define the search space with hyperopt randint, uniform etc. and it works (at least no fatal errors). However, I keep receiving an error.
2020-08-22 13:47:05,381 WARNING util.py:137 -- The experiment_checkpoint
operation took 10.414000749588013 seconds to complete, which may be a performance bottleneck.
2020-08-22 13:47:05,382 ERROR trial_runner.py:375 -- Trial Runner checkpointing failed.
Traceback (most recent call last):
File "C:\ProgramData\Anaconda3\envs\ntanalysis\lib\site-packages\ray\tune\trial_runner.py", line 373, in step
self.checkpoint()
File "C:\ProgramData\Anaconda3\envs\ntanalysis\lib\site-packages\ray\tune\trial_runner.py", line 302, in checkpoint
self._local_checkpoint_dir, session_str=self._session_str)
File "C:\ProgramData\Anaconda3\envs\ntanalysis\lib\site-packages\ray\tune\suggest\search_generator.py", line 192, in save_to_dir
base_searcher.save_to_dir(dirpath, session_str)
File "C:\ProgramData\Anaconda3\envs\ntanalysis\lib\site-packages\ray\tune\suggest\suggestion.py", line 210, in save_to_dir
self.CKPT_FILE_TMPL.format(session_str)))
FileExistsError: [WinError 183] Cannot create a file when that file already exists: 'C:\Users\User\ray_results\train_flat_price\.tmp_searcher_ckpt' -> 'C:\Users\User\ray_results\train_flat_price\searcher-state-2020-08-22_12-24-21.pkl'
Help to connect 2 PCs pls!
(base) C:\Users\Gofiar>ray start --address='address' --redis-password='pass'
Traceback (most recent call last):
File "c:\users\gofiar\anaconda3\lib\runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "c:\users\gofiar\anaconda3\lib\runpy.py", line 85, in _run_code
exec(code, runglobals)
File "C:\Users\Gofiar\anaconda3\Scripts\ray.exe_main.py", line 7, in
I had a similar issue: After ray start --head
the head node prints:-
--------------------
Ray runtime started.
--------------------
Next steps To connect to this Ray runtime from another node, run ray start --address='192.168.143.221:6379' --redis-password='password here'
but address_to_ip(address):
in services.py
does not trim the quotes from the IP address so socket.gethostbyname(address_parts[0])
throws an error. The message from ray start
is misleading. Try without the quotes around the IP address.
FileNotFoundError: [Errno 2] Dashboard build directory not found. If installing from source, please follow the additional steps required to build the dashboard(cd python/ray/dashboard/client && npm ci && npm run build): 'C:\Users\crist\anaconda3\lib\site-packages\ray\dashboard\client/build'
Help!
any solution for this
@talhaanwarch the dashboard currently does not work on Windows. I recommend passing include_dashboard=False
when calling ray.init()
ok so Render via Ray dashboard don't work on Windows 10 there are any other way to see the process working?
Hello, I run the code ray.init()
, then I got a error. Could you please tell me how to solve this problem?
Here is the error.
Traceback (most recent call last):
File "
@kuangsangudu I think you need to put C:\Windows\System32\WBEM
in your PATH
.
Hello, I also encountered this problem. I want to run ray on two Windows systems. When I run ray start --head
on one computer, the following prompt appears:
Local node IP: 192.168.195.134
2021-02-05 13:17:36,016 INFO services.py:1171 -- View the Ray dashboard at http://localhost:8265
--------------------
Ray runtime started.
--------------------
Next steps
To connect to this Ray runtime from another node, run
ray start --address='192.168.195.134:6379' --redis-password='5241590000000000'
Alternatively, use the following Python code:
import ray
ray.init(address='auto', _redis_password='5241590000000000')
If connection fails, check your firewall settings and network configuration.
To terminate the Ray runtime, run
ray stop
Then when I ran the ray start --address=192.168.195.134:6379 --redis-password='5241590000000000'
command on another computer, the following message appeared:
Traceback (most recent call last):
File "c:\programdata\anaconda3\lib\site-packages\ray\_private\services.py", line 640, in wait_for_redis_to_start
redis_client.client_list()
File "c:\programdata\anaconda3\lib\site-packages\redis\client.py", line 1194, in client_list
return self.execute_command('CLIENT LIST')
File "c:\programdata\anaconda3\lib\site-packages\redis\client.py", line 898, in execute_command
conn = self.connection or pool.get_connection(command_name, **options)
File "c:\programdata\anaconda3\lib\site-packages\redis\connection.py", line 1192, in get_connection
connection.connect()
File "c:\programdata\anaconda3\lib\site-packages\redis\connection.py", line 567, in connect
self.on_connect()
File "c:\programdata\anaconda3\lib\site-packages\redis\connection.py", line 643, in on_connect
auth_response = self.read_response()
File "c:\programdata\anaconda3\lib\site-packages\redis\connection.py", line 739, in read_response
response = self._parser.read_response()
File "c:\programdata\anaconda3\lib\site-packages\redis\connection.py", line 484, in read_response
raise response
redis.exceptions.AuthenticationError: invalid password
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "c:\programdata\anaconda3\lib\runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "c:\programdata\anaconda3\lib\runpy.py", line 87, in _run_code
exec(code, run_globals)
File "C:\ProgramData\Anaconda3\Scripts\ray.exe\__main__.py", line 7, in <module>
File "c:\programdata\anaconda3\lib\site-packages\ray\scripts\scripts.py", line 1504, in main
return cli()
File "c:\programdata\anaconda3\lib\site-packages\click\core.py", line 829, in __call__
return self.main(*args, **kwargs)
File "c:\programdata\anaconda3\lib\site-packages\click\core.py", line 782, in main
rv = self.invoke(ctx)
File "c:\programdata\anaconda3\lib\site-packages\click\core.py", line 1259, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "c:\programdata\anaconda3\lib\site-packages\click\core.py", line 1066, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "c:\programdata\anaconda3\lib\site-packages\click\core.py", line 610, in invoke
return callback(*args, **kwargs)
File "c:\programdata\anaconda3\lib\site-packages\ray\scripts\scripts.py", line 627, in start
services.wait_for_redis_to_start(
File "c:\programdata\anaconda3\lib\site-packages\ray\_private\services.py", line 650, in wait_for_redis_to_start
raise RuntimeError("Unable to connect to Redis at {}:{}.".format(
RuntimeError: Unable to connect to Redis at 192.168.195.134:6379.
Have you encountered this situation? Can you help me see how to solve it?
@iuming: I'm not sure, but it doesn't sound like invalid password
is Windows-specific. If you've double-checked the password is correct, try posting another Issue?
@mehrdadn I copied the prompt command to make sure the password is correct. But this error message still appears!
@iuming: This issue is only for Windows issues, but I don't see anything indicating yours is Windows-specific. Try opening a new Issue?
@mehrdadn Okay, I will open a new issue.
I'm also having issues with the Ray dashboard both on Windows 10 and WSL. When starting, Ray prints View the Ray dashboard at http://localhost:8265
but when visiting localhost:8265 the page load fails.
Is this the current expected behavior for Windows? And also for WSL?
Also, if I run ray status --address <head-node-ip>:6379
, I get the following error:
WARNING: Logging before InitGoogleLogging() is written to STDERR
F0210 18:34:48.054060 213 213 service_based_gcs_client.cc:207] Couldn't reconnect to GCS server. The last attempted GCS server address was 131.234.28.107:35699
*** Check failure stack trace: ***
Aborted (core dumped)
Ah, I just found that running ray dashboard cluster.yaml
solves my dashboard problem! I can now access the dashboard locally at http://localhost:8265/ on my Windows laptop. The cluster itself (head and worker nodes) are Linux machines though.
I'm running ray up
and ray dashboard
on WSL on my laptop.
@stefanbschneider If you are on a Windows system, if you run ray start --head
first and then ray dashboard cluster.yaml
, can dashboard be displayed at http://localhost:8265/? Why does the following error message appear after I run ray dashboard cluster.yaml
:
Attempting to establish dashboard locally at localhost:8265 connected to remote port 8265
Error: Failed to forward dashboard from remote port 8265 to local port 8265. There are a couple possibilities:
1. The remote port is incorrectly specified
2. The local port 8265 is already in use.
The exception is: [Errno 2] No such file or directory: 'cluster.yaml'
cluster.yaml
is just what I called my cluster configuration file, which is based on the example here.
You'll have to adjust this to the name/path of your config. Apparently, it's not cluster.yaml
.
@stefanbschneider It turned out to be so, thank you!
@talhaanwarch the dashboard currently does not work on Windows. I recommend passing
include_dashboard=False
when callingray.init()
Doesn't work for mini-cluster, it still trying to load dashboard (I'm on ray version 1.1.0, Windows 10):
My code:
def test_run_e2e_hyperparam_search_mini_cluster_ray_distributed(self):
from ray.cluster_utils import Cluster
# Starts a head-node for the cluster.
cluster = Cluster(
initialize_head=True,
head_node_args={
"num_cpus": 1,
})
ray.init(address=cluster.address, include_dashboard=False)
And this is the error:
2021-02-11 14:07:29,597 INFO View the Ray dashboard at http://127.0.0.1:8265
2021-02-11 14:07:30,732 INFO worker.py:656 -- Connecting to existing Ray cluster at address: 10.240.194.92:6379
2021-02-11 14:07:31,152 WARNING worker.py:1034 -- The actor or task with ID df5a1a828c9685d3ffffffff01000000 cannot be scheduled right now. It requires {CPU: 1.000000}, {GPU: 1.000000} for placement, however the cluster currently cannot provide the requested resources. The required resources may be added as autoscaling takes place or placement groups are scheduled. Otherwise, consider reducing the resource requirements of the task.
2021-02-11 14:07:40,106 WARNING worker.py:1034 -- The dashboard on node TLVCMEW001410 failed with the following error:
Traceback (most recent call last):
File "C:\Users\dm57337\.conda\envs\py38tf\lib\site-packages\ray\new_dashboard\dashboard.py", line 187, in <module>
dashboard = Dashboard(
File "C:\Users\dm57337\.conda\envs\py38tf\lib\site-packages\ray\new_dashboard\dashboard.py", line 81, in __init__
build_dir = setup_static_dir()
File "C:\Users\dm57337\.conda\envs\py38tf\lib\site-packages\ray\new_dashboard\dashboard.py", line 38, in setup_static_dir
raise OSError(
FileNotFoundError: [Errno 2] Dashboard build directory not found. If installing from source, please follow the additional steps required to build the dashboard(cd python/ray/new_dashboard/client && npm install && npm ci && npm run build): 'C:\\Users\\dm57337\\.conda\\envs\\py38tf\\lib\\site-packages\\ray\\new_dashboard\\client\\build'
Does the dashboard still not work for windows users? Can't connect to the dash on windows 10. Have tried disabling firewall etc...
Hello, I also encountered this problem. I want to run ray on two Windows systems. When I run
ray start --head
on one computer, the following prompt appears:Local node IP: 192.168.195.134 2021-02-05 13:17:36,016 INFO services.py:1171 -- View the Ray dashboard at http://localhost:8265 -------------------- Ray runtime started. -------------------- Next steps To connect to this Ray runtime from another node, run ray start --address='192.168.195.134:6379' --redis-password='5241590000000000' Alternatively, use the following Python code: import ray ray.init(address='auto', _redis_password='5241590000000000') If connection fails, check your firewall settings and network configuration. To terminate the Ray runtime, run ray stop
Then when I ran the
ray start --address=192.168.195.134:6379 --redis-password='5241590000000000'
command on another computer, the following message appeared:Traceback (most recent call last): File "c:\programdata\anaconda3\lib\site-packages\ray\_private\services.py", line 640, in wait_for_redis_to_start redis_client.client_list() File "c:\programdata\anaconda3\lib\site-packages\redis\client.py", line 1194, in client_list return self.execute_command('CLIENT LIST') File "c:\programdata\anaconda3\lib\site-packages\redis\client.py", line 898, in execute_command conn = self.connection or pool.get_connection(command_name, **options) File "c:\programdata\anaconda3\lib\site-packages\redis\connection.py", line 1192, in get_connection connection.connect() File "c:\programdata\anaconda3\lib\site-packages\redis\connection.py", line 567, in connect self.on_connect() File "c:\programdata\anaconda3\lib\site-packages\redis\connection.py", line 643, in on_connect auth_response = self.read_response() File "c:\programdata\anaconda3\lib\site-packages\redis\connection.py", line 739, in read_response response = self._parser.read_response() File "c:\programdata\anaconda3\lib\site-packages\redis\connection.py", line 484, in read_response raise response redis.exceptions.AuthenticationError: invalid password The above exception was the direct cause of the following exception: Traceback (most recent call last): File "c:\programdata\anaconda3\lib\runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, File "c:\programdata\anaconda3\lib\runpy.py", line 87, in _run_code exec(code, run_globals) File "C:\ProgramData\Anaconda3\Scripts\ray.exe\__main__.py", line 7, in <module> File "c:\programdata\anaconda3\lib\site-packages\ray\scripts\scripts.py", line 1504, in main return cli() File "c:\programdata\anaconda3\lib\site-packages\click\core.py", line 829, in __call__ return self.main(*args, **kwargs) File "c:\programdata\anaconda3\lib\site-packages\click\core.py", line 782, in main rv = self.invoke(ctx) File "c:\programdata\anaconda3\lib\site-packages\click\core.py", line 1259, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) File "c:\programdata\anaconda3\lib\site-packages\click\core.py", line 1066, in invoke return ctx.invoke(self.callback, **ctx.params) File "c:\programdata\anaconda3\lib\site-packages\click\core.py", line 610, in invoke return callback(*args, **kwargs) File "c:\programdata\anaconda3\lib\site-packages\ray\scripts\scripts.py", line 627, in start services.wait_for_redis_to_start( File "c:\programdata\anaconda3\lib\site-packages\ray\_private\services.py", line 650, in wait_for_redis_to_start raise RuntimeError("Unable to connect to Redis at {}:{}.".format( RuntimeError: Unable to connect to Redis at 192.168.195.134:6379.
Have you encountered this situation? Can you help me see how to solve it?
@iuming I had the same problem, and what @Dris101 said is correct, so try without quotes for both IP address and password. Then it works for me.
Does the dashboard still not work for windows users? Can't connect to the dash on windows 10. Have tried disabling firewall etc...
@mcflem06 Thank you very much! I am sure I have turned off the firewall.
Hello, I also encountered this problem. I want to run ray on two Windows systems. When I run
ray start --head
on one computer, the following prompt appears:Local node IP: 192.168.195.134 2021-02-05 13:17:36,016 INFO services.py:1171 -- View the Ray dashboard at http://localhost:8265 -------------------- Ray runtime started. -------------------- Next steps To connect to this Ray runtime from another node, run ray start --address='192.168.195.134:6379' --redis-password='5241590000000000' Alternatively, use the following Python code: import ray ray.init(address='auto', _redis_password='5241590000000000') If connection fails, check your firewall settings and network configuration. To terminate the Ray runtime, run ray stop
Then when I ran the
ray start --address=192.168.195.134:6379 --redis-password='5241590000000000'
command on another computer, the following message appeared:Traceback (most recent call last): File "c:\programdata\anaconda3\lib\site-packages\ray\_private\services.py", line 640, in wait_for_redis_to_start redis_client.client_list() File "c:\programdata\anaconda3\lib\site-packages\redis\client.py", line 1194, in client_list return self.execute_command('CLIENT LIST') File "c:\programdata\anaconda3\lib\site-packages\redis\client.py", line 898, in execute_command conn = self.connection or pool.get_connection(command_name, **options) File "c:\programdata\anaconda3\lib\site-packages\redis\connection.py", line 1192, in get_connection connection.connect() File "c:\programdata\anaconda3\lib\site-packages\redis\connection.py", line 567, in connect self.on_connect() File "c:\programdata\anaconda3\lib\site-packages\redis\connection.py", line 643, in on_connect auth_response = self.read_response() File "c:\programdata\anaconda3\lib\site-packages\redis\connection.py", line 739, in read_response response = self._parser.read_response() File "c:\programdata\anaconda3\lib\site-packages\redis\connection.py", line 484, in read_response raise response redis.exceptions.AuthenticationError: invalid password The above exception was the direct cause of the following exception: Traceback (most recent call last): File "c:\programdata\anaconda3\lib\runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, File "c:\programdata\anaconda3\lib\runpy.py", line 87, in _run_code exec(code, run_globals) File "C:\ProgramData\Anaconda3\Scripts\ray.exe\__main__.py", line 7, in <module> File "c:\programdata\anaconda3\lib\site-packages\ray\scripts\scripts.py", line 1504, in main return cli() File "c:\programdata\anaconda3\lib\site-packages\click\core.py", line 829, in __call__ return self.main(*args, **kwargs) File "c:\programdata\anaconda3\lib\site-packages\click\core.py", line 782, in main rv = self.invoke(ctx) File "c:\programdata\anaconda3\lib\site-packages\click\core.py", line 1259, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) File "c:\programdata\anaconda3\lib\site-packages\click\core.py", line 1066, in invoke return ctx.invoke(self.callback, **ctx.params) File "c:\programdata\anaconda3\lib\site-packages\click\core.py", line 610, in invoke return callback(*args, **kwargs) File "c:\programdata\anaconda3\lib\site-packages\ray\scripts\scripts.py", line 627, in start services.wait_for_redis_to_start( File "c:\programdata\anaconda3\lib\site-packages\ray\_private\services.py", line 650, in wait_for_redis_to_start raise RuntimeError("Unable to connect to Redis at {}:{}.".format( RuntimeError: Unable to connect to Redis at 192.168.195.134:6379.
Have you encountered this situation? Can you help me see how to solve it?
@iuming I had the same problem, and what @Dris101 said is correct, so try without quotes for both IP address and password. Then it works for me.
@weigao-123 Thanks for your suggestions, I will try it.
Problem:
Running the code example below, the process gets stuck in ray.init()
and nothing else happens (no error or warning messages).
What could be the problem?
Under my WSL (Ubuntu 20.04) all works fine, but performance slows down and thus I prefer to run ray/RLlib under Windows.
Information: OS: Microsoft Windows 10 Pro, version 10.0.19042 Build 19042 Python: 3.8.5 64-bit Ray: 1.2.0
Reproduction script:
import ray
print("start")
ray.init(include_dashboard=False)
print("end")
@weigao-123 Sorry, after I removed the quotation marks, the following error occurred:
When I enter ray start --head
on one computer and ray start --address=192.168.1.121:6379 --redis-password=5241590000000000
on another computer,
Local node IP: 192.168.1.116
Traceback (most recent call last):
File "e:\conda\lib\runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "e:\conda\lib\runpy.py", line 87, in _run_code
exec(code, run_globals)
File "E:\conda\Scripts\ray.exe\__main__.py", line 7, in <module>
File "e:\conda\lib\site-packages\ray\scripts\scripts.py", line 1519, in main
return cli()
File "e:\conda\lib\site-packages\click\core.py", line 829, in __call__
return self.main(*args, **kwargs)
File "e:\conda\lib\site-packages\click\core.py", line 782, in main
rv = self.invoke(ctx)
File "e:\conda\lib\site-packages\click\core.py", line 1259, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "e:\conda\lib\site-packages\click\core.py", line 1066, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "e:\conda\lib\site-packages\click\core.py", line 610, in invoke
return callback(*args, **kwargs)
File "e:\conda\lib\site-packages\ray\scripts\scripts.py", line 651, in start
node = ray.node.Node(
File "e:\conda\lib\site-packages\ray\node.py", line 156, in __init__
self._init_temp(redis_client)
File "e:\conda\lib\site-packages\ray\node.py", line 254, in _init_temp
self._temp_dir = ray.utils.decode(temp_dir)
File "e:\conda\lib\site-packages\ray\utils.py", line 176, in decode
return byte_str.decode("ascii")
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe5 in position 9: ordinal not in range(128)
@weigao-123 Sorry, after I removed the quotation marks, the following error occurred: When I enter
ray start --head
on one computer andray start --address=192.168.1.121:6379 --redis-password=5241590000000000
on another computer,Local node IP: 192.168.1.116 Traceback (most recent call last): File "e:\conda\lib\runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, File "e:\conda\lib\runpy.py", line 87, in _run_code exec(code, run_globals) File "E:\conda\Scripts\ray.exe\__main__.py", line 7, in <module> File "e:\conda\lib\site-packages\ray\scripts\scripts.py", line 1519, in main return cli() File "e:\conda\lib\site-packages\click\core.py", line 829, in __call__ return self.main(*args, **kwargs) File "e:\conda\lib\site-packages\click\core.py", line 782, in main rv = self.invoke(ctx) File "e:\conda\lib\site-packages\click\core.py", line 1259, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) File "e:\conda\lib\site-packages\click\core.py", line 1066, in invoke return ctx.invoke(self.callback, **ctx.params) File "e:\conda\lib\site-packages\click\core.py", line 610, in invoke return callback(*args, **kwargs) File "e:\conda\lib\site-packages\ray\scripts\scripts.py", line 651, in start node = ray.node.Node( File "e:\conda\lib\site-packages\ray\node.py", line 156, in __init__ self._init_temp(redis_client) File "e:\conda\lib\site-packages\ray\node.py", line 254, in _init_temp self._temp_dir = ray.utils.decode(temp_dir) File "e:\conda\lib\site-packages\ray\utils.py", line 176, in decode return byte_str.decode("ascii") UnicodeDecodeError: 'ascii' codec can't decode byte 0xe5 in position 9: ordinal not in range(128)
@iuming I think this is probably because of your own environment, e.g. the language you use is non-english or something, and you can easily find more information and solutions online. A quick explaination: https://github.com/odoo/odoo/issues/773
Get stuck in ray.init()
Problem: Running the code example below, the process gets stuck in
ray.init()
and nothing else happens (no error or warning messages). What could be the problem? Under my WSL (Ubuntu 20.04) all works fine, but performance slows down and thus I prefer to run ray/RLlib under Windows.Information: OS: Microsoft Windows 10 Pro, version 10.0.19042 Build 19042 Python: 3.8.5 64-bit Ray: 1.2.0
Reproduction script:
import ray print("start") ray.init(include_dashboard=False) print("end")
Any ideas or related issues? TIA!
@kk-55 I'm not sure. I'd recommend finding the latest versions of Ray & Python that work, and posting them here to help the team look into it.
@mehrdadn Do you mean combining https://docs.ray.io/en/master/installation.html#daily-releases-nightlies and Windows Python 3.8.5 64-bit? And thereafter hoping for further help from the team?
@kk-55 Yup.
Problem description
Running the repo script below, ray can't be initialized. No error/warning occurs and it just won't terminate (always ends up in line ret = ray.init()
).
Console/debugger outputs look like this:
What can I do or what could be the problem?
System information OS: Windows 10 Pro, version 10.0.19042 Build 19042 Ray: lastest nightly wheel for Windows Python 3.8 https://s3-us-west-2.amazonaws.com/ray-wheels/latest/ray-2.0.0.dev0-cp38-cp38-win_amd64.whl Python: 3.8.5 64-bit
Repo script
import ray
print("start")
print(ray.is_initialized())
ret = ray.init()
print(ret)
print(ray.is_initialized())
ray.shutdown()
print("end")
@kk-55: Sorry, what I was saying was, try to find the latest nightly wheel that does work correctly. Not one that's broken. That way the team can look at what changes might have occurred in the subsequent commit.
@mehrdadn Sorry, but is there a wheel that had previously worked? Or do you know which one to try first?
@kk-55: I don't know about Python 3.8.5 in particular, but of course the Windows wheels have been working for some time. This user got a wheel working on Windows, for example, but I don't know what commit it was. Have you ever managed to run it successfully on any version of Python in the past? Or is this the first time you're trying to run Ray on Windows?
@mehrdadn Not yet, it's the first time I try to run Ray on Windows.
I think that user got a wheel working on Windows simply took the wheel for the latest release. I also changed to Python 3.8.7 and tried to reproduce, but ray.init()
still gets stuck w/o any prompt.
@kk-55: I see. I just pip install
ed the latest version of Ray on Python 3.8.7 and verified that it runs correctly, so something seems to be wrong on your machine. I would say try older commits until you find one that works. (Binary search might be helpful here.) Then post whatever you find as a new issue (not here).
Ray can't be initialized while connected to redis System information OS: Windows 10 Ray: 1.2.0 Python: 3.7 64-bit
Problem description:
Running the script on a github program about DML and I built a simple redis cluster with 3 nodes and 3 slaves on my local laptop, but ray can't be initialized when it tries to connect to the redis cluster. I am pretty sure that I open the redis cluster mode.
Console/debugger outputs look like this:
Please post a new issue if you encounter any problems on Windows. I think I'm going to close this one. This issue was mainly intended as a reference table for existing Windows issues and to discuss what should be on the table, not as a separate place to post Windows-specific issues. Thanks everyone!
Runing exeriment with HyperOptSearch and LightGBM, and receive rror message.
raise TuneError("Trials did not complete", incomplete_trials)
OSError: [WinError 123] The filename, directory name, or volume label syntax is incorrect: 'C:\Users\User\ray_results\train_flat_price\train_flat_price_1_bagging_fraction=tune.sample_from(<function uniform.. at 0x000001C05ECACE58>),feature_fraction=_2020-08-20_18-00-270k00linp'
Had the same issue on my machine: OSError: [WinError 123] La sintassi del nome del file, della directory o del volume non è corretta: "C:\Users\***\ray_results\hello\VanillaGan_47297_00000_0_batch_size=32,d_optimizer=<class 'tensorflow.python.keras.optimizer_v2.adam.Adam'>,d_rate=0.00040768,gene_2021-04-21_15-34-21"
In my case I believe it was due to restrinctions in folder names on windows. To solve it I just had to add a regex filter in the create_logdir method in ray/tune/trial (line 137) to remove restricted characters. Everything seems to works fine afterwords
Hi,
I have been trying to get ray working on Windows for a few days now, but I keep running into the same problem. Ray keeps hanging on init. The following error message is logged in the worker log:
[2021-05-27 23:27:35,441 E 25468 3004] core_worker.cc:390: Failed to register worker 11baac3114b0e5ec6797733be05ecfeeb3cca79520cff01f14712d28 to Raylet. Invalid: Invalid: Unknown worker
The following is logged in raylet.out
:
[2021-05-27 23:34:14,449 I 22100 25172] io_service_pool.cc:35: IOServicePool is running with 1 io_service.
[2021-05-27 23:34:14,661 I 22100 25172] store_runner.cc:29: Allowing the Plasma store to use up to 1.85846GB of memory.
[2021-05-27 23:34:14,662 I 22100 25172] store_runner.cc:42: Starting object store with directory C:\Users\Bram\AppData\Local\Temp and huge page support disabled
[2021-05-27 23:34:14,664 I 22100 25172] grpc_server.cc:71: ObjectManager server started, listening on port 51896.
[2021-05-27 23:34:14,666 I 22100 25172] node_manager.cc:230: Initializing NodeManager with ID 5f4f53b3891a61c81991e86f1fc3dda550d6bd43ffe8a4ef054b49c5
[2021-05-27 23:34:14,666 I 22100 25172] grpc_server.cc:71: NodeManager server started, listening on port 51898.
[2021-05-27 23:34:14,786 I 22100 25172] raylet.cc:146: Raylet of id, 5f4f53b3891a61c81991e86f1fc3dda550d6bd43ffe8a4ef054b49c5 started. Raylet consists of node_manager and object_manager. node_manager address: 192.168.0.25:51898 object_manager address: 192.168.0.25:51896 hostname: 192.168.0.25
[2021-05-27 23:34:14,787 I 22100 15128] agent_manager.cc:76: Monitor agent process with pid 24972, register timeout 30000ms.
[2021-05-27 23:34:14,792 I 22100 25172] service_based_accessor.cc:579: Received notification for node id = 5f4f53b3891a61c81991e86f1fc3dda550d6bd43ffe8a4ef054b49c5, IsAlive = 1
[2021-05-27 23:34:15,544 I 22100 25172] worker_pool.cc:289: Started worker process of 1 worker(s) with pid 18128
[2021-05-27 23:34:16,228 W 22100 25172] worker_pool.cc:418: Received a register request from an unknown worker 22252
[2021-05-27 23:34:16,230 I 22100 25172] node_manager.cc:1132: NodeManager::DisconnectClient, disconnect_type=0, has creation task exception = 0
[2021-05-27 23:34:16,230 I 22100 25172] node_manager.cc:1146: Ignoring client disconnect because the client has already been disconnected.
[2021-05-27 23:34:26,551 W 22100 9452] metric_exporter.cc:206: Export metrics to agent failed: IOError: 14: failed to connect to all addresses. This won't affect Ray, but you can lose metrics from the cluster.
[2021-05-27 23:34:34,612 W 22100 9452] metric_exporter.cc:206: Export metrics to agent failed: IOError: 14: failed to connect to all addresses. This won't affect Ray, but you can lose metrics from the cluster.
[2021-05-27 23:34:44,700 W 22100 9452] metric_exporter.cc:206: Export metrics to agent failed: IOError: 14: failed to connect to all addresses. This won't affect Ray, but you can lose metrics from the cluster.
[2021-05-27 23:34:44,788 W 22100 25172] agent_manager.cc:82: Agent process with pid 24972 has not registered, restart it.
[2021-05-27 23:34:44,789 W 22100 15128] agent_manager.cc:92: Agent process with pid 24972 exit, return value 1067
[2021-05-27 23:34:45,545 I 22100 25172] worker_pool.cc:315: Some workers of the worker process(18128) have not registered to raylet within timeout.
[2021-05-27 23:34:45,793 I 22100 18252] agent_manager.cc:76: Monitor agent process with pid 25032, register timeout 30000ms.
And this is logged in gcs_server.out
:
[2021-05-27 23:34:14,163 I 7164 3876] io_service_pool.cc:35: IOServicePool is running with 1 io_service.
[2021-05-27 23:34:14,165 I 7164 3876] gcs_redis_failure_detector.cc:30: Starting redis failure detector.
[2021-05-27 23:34:14,167 I 7164 3876] gcs_init_data.cc:44: Loading job table data.
[2021-05-27 23:34:14,167 I 7164 3876] gcs_init_data.cc:56: Loading node table data.
[2021-05-27 23:34:14,167 I 7164 3876] gcs_init_data.cc:68: Loading object table data.
[2021-05-27 23:34:14,167 I 7164 3876] gcs_init_data.cc:81: Loading cluster resources table data.
[2021-05-27 23:34:14,167 I 7164 3876] gcs_init_data.cc:108: Loading actor table data.
[2021-05-27 23:34:14,167 I 7164 3876] gcs_init_data.cc:94: Loading placement group table data.
[2021-05-27 23:34:14,171 I 7164 3876] gcs_init_data.cc:48: Finished loading job table data, size = 0
[2021-05-27 23:34:14,171 I 7164 3876] gcs_init_data.cc:60: Finished loading node table data, size = 0
[2021-05-27 23:34:14,171 I 7164 3876] gcs_init_data.cc:73: Finished loading object table data, size = 0
[2021-05-27 23:34:14,171 I 7164 3876] gcs_init_data.cc:85: Finished loading cluster resources table data, size = 0
[2021-05-27 23:34:14,171 I 7164 3876] gcs_init_data.cc:112: Finished loading actor table data, size = 0
[2021-05-27 23:34:14,171 I 7164 3876] gcs_init_data.cc:99: Finished loading placement group table data, size = 0
[2021-05-27 23:34:14,171 I 7164 3876] gcs_heartbeat_manager.cc:30: GcsHeartbeatManager start, num_heartbeats_timeout=300
[2021-05-27 23:34:14,385 I 7164 3876] grpc_server.cc:71: GcsServer server started, listening on port 51888.
[2021-05-27 23:34:14,391 I 7164 3876] gcs_server.cc:276: Gcs server address = 192.168.0.25:51888
[2021-05-27 23:34:14,392 I 7164 3876] gcs_server.cc:280: Finished setting gcs server address: 192.168.0.25:51888
[2021-05-27 23:34:14,392 I 7164 3876] gcs_server.cc:379: GcsNodeManager: {RegisterNode request count: 0, UnregisterNode request count: 0, GetAllNodeInfo request count: 0, GetInternalConfig request count: 0}
GcsActorManager: {RegisterActor request count: 0, CreateActor request count: 0, GetActorInfo request count: 0, GetNamedActorInfo request count: 0, KillActor request count: 0, Registered actors count: 0, Destroyed actors count: 0, Named actors count: 0, Unresolved actors count: 0, Pending actors count: 0, Created actors count: 0}
GcsObjectManager: {GetObjectLocations request count: 0, GetAllObjectLocations request count: 0, AddObjectLocation request count: 0, RemoveObjectLocation request count: 0, Object count: 0}
GcsPlacementGroupManager: {CreatePlacementGroup request count: 0, RemovePlacementGroup request count: 0, GetPlacementGroup request count: 0, GetAllPlacementGroup request count: 0, WaitPlacementGroupUntilReady request count: 0, Registered placement groups count: 0, Named placement group count: 0, Pending placement groups count: 0}
GcsPubSub:
- num channels subscribed to: 0
- total commands queued: 0
DefaultTaskInfoHandler: {AddTask request count: 0, GetTask request count: 0, AddTaskLease request count: 0, GetTaskLease request count: 0, AttemptTaskReconstruction request count: 0}
[2021-05-27 23:34:14,786 I 7164 3876] gcs_node_manager.cc:34: Registering node info, node id = 5f4f53b3891a61c81991e86f1fc3dda550d6bd43ffe8a4ef054b49c5, address = 192.168.0.25
[2021-05-27 23:34:14,786 I 7164 3876] gcs_node_manager.cc:39: Finished registering node info, node id = 5f4f53b3891a61c81991e86f1fc3dda550d6bd43ffe8a4ef054b49c5, address = 192.168.0.25
[2021-05-27 23:34:14,792 I 7164 3876] gcs_job_manager.cc:93: Getting all job info.
[2021-05-27 23:34:14,792 I 7164 3876] gcs_job_manager.cc:99: Finished getting all job info.
[2021-05-27 23:34:15,544 I 7164 3876] gcs_job_manager.cc:26: Adding job, job id = 01000000, driver pid = 21792
[2021-05-27 23:34:15,544 I 7164 3876] gcs_job_manager.cc:36: Finished adding job, job id = 01000000, driver pid = 21792
[2021-05-27 23:34:26,246 W 7164 12960] metric_exporter.cc:206: Export metrics to agent failed: IOError: 14: failed to connect to all addresses. This won't affect Ray, but you can lose metrics from the cluster.
[2021-05-27 23:34:34,310 W 7164 12960] metric_exporter.cc:206: Export metrics to agent failed: IOError: 14: failed to connect to all addresses. This won't affect Ray, but you can lose metrics from the cluster.
[2021-05-27 23:34:44,398 W 7164 12960] metric_exporter.cc:206: Export metrics to agent failed: IOError: 14: failed to connect to all addresses. This won't affect Ray, but you can lose metrics from the cluster.
[2021-05-27 23:34:54,467 W 7164 12960] metric_exporter.cc:206: Export metrics to agent failed: IOError: 14: failed to connect to all addresses. This won't affect Ray, but you can lose metrics from the cluster.
[2021-05-27 23:35:04,538 W 7164 12960] metric_exporter.cc:206: Export metrics to agent failed: IOError: 14: failed to connect to all addresses. This won't affect Ray, but you can lose metrics from the cluster.
[2021-05-27 23:35:14,392 I 7164 3876] gcs_server.cc:379: GcsNodeManager: {RegisterNode request count: 1, UnregisterNode request count: 0, GetAllNodeInfo request count: 3, GetInternalConfig request count: 1}
GcsActorManager: {RegisterActor request count: 0, CreateActor request count: 0, GetActorInfo request count: 0, GetNamedActorInfo request count: 0, KillActor request count: 0, Registered actors count: 0, Destroyed actors count: 0, Named actors count: 0, Unresolved actors count: 0, Pending actors count: 0, Created actors count: 0}
GcsObjectManager: {GetObjectLocations request count: 0, GetAllObjectLocations request count: 0, AddObjectLocation request count: 0, RemoveObjectLocation request count: 0, Object count: 0}
GcsPlacementGroupManager: {CreatePlacementGroup request count: 0, RemovePlacementGroup request count: 0, GetPlacementGroup request count: 0, GetAllPlacementGroup request count: 0, WaitPlacementGroupUntilReady request count: 0, Registered placement groups count: 0, Named placement group count: 0, Pending placement groups count: 0}
I am trying to start ray using this in python:
ray.init(local_mode=True, include_dashboard=False, num_gpus=1, num_cpus=1, logging_level=logging.DEBUG)
Python version: 3.7.8 Ray version: latest windows release
Please post a new issue if you encounter any problems on Windows! Thank you!
This page is intended to list known Ray issues on Windows in one central location.
The latest nightly wheels may have already addressed some issues since the latest official release.
Check back here for updates as issues are addressed.
You can vote by reacting 👍 on each issue that is impacting you to help us prioritize issues. 🙂
(Maintainers: Please reference this issue in other posts. That will allow their statuses to show up here.)
9265: Unable to run multiple instances of Ray at once
9259: Multi-node connection failure to Redis
9239:
AttributeError: 'AsyncStream' object has no attribute 'fileno'
9196:
assert num_imports >= num_imported
with PyTorch and Ray TuneFileExistsError
with Ray Tune~9117: Missing Unicode support in C++ core
9116:
Check failed: assigned_port != -1
on virtual Python 3.7 or 3.8 environments9083:
ImportThread: max number of clients reached
andCheck failed: _s.ok() Bad status: RedisError
9074: Access violation in
msvcp140.dll!mtx_do_lock
, called fromRedisAsyncContext::RedisAsyncHandleRead
8787:
assertion failed: grpc_server_request_registered_call(...) == GRPC_CALL_OK
gpustat
currently requirespip install git+https://github.com/wookayin/gpustat
18944: Documents the possible causes of AccessViolationException on Windows while using Ray. Also links to a branch containing potential fix for this issue.