I tried to run this branch on a ray cluster, however got error messages below:
ray.exceptions.RayTaskError(_InactiveRpcError): [36mray::RolloutWorker.get_status()[39m
File "python/ray/_raylet.pyx", line 422, in ray._raylet.execute_task
File "python/ray/_raylet.pyx", line 422, in ray._raylet.execute_task
File "python/ray/_raylet.pyx", line 456, in ray._raylet.execute_task
File "python/ray/_raylet.pyx", line 459, in ray._raylet.execute_task
File "python/ray/_raylet.pyx", line 463, in ray._raylet.execute_task
File "python/ray/_raylet.pyx", line 415, in ray._raylet.execute_task.function_executor
File "/home/malib_cls_1206/malib/rollout/rollout_worker.py", line 44, in __init__
self, worker_index, env_desc, metric_type, remote, save, **kwargs
File "/home/malib_cls_1206/malib/rollout/base_worker.py", line 102, in __init__
**kwargs["exp_cfg"],
File "/home/malib_cls_1206/malib/utils/logger/__init__.py", line 249, in get_logger
primary=expr_group, secondary=expr_name
File "/home/malib_cls_1206/malib/rpc/ExperimentManager/ExperimentClient.py", line 73, in create_table
self._create_table_callback(future.result()[0])
File "/home/anaconda3/lib/python3.6/concurrent/futures/_base.py", line 425, in result
return self.__get_result()
File "/home/anaconda3/lib/python3.6/concurrent/futures/_base.py", line 384, in __get_result
raise self._exception
File "/home/anaconda3/lib/python3.6/concurrent/futures/thread.py", line 56, in run
result = self.fn(*self.args, **self.kwargs)
File "/home/malib_cls_1206/malib/rpc/ExperimentManager/ExperimentClient.py", line 47, in _create_table
table_key = stub.CreateTable(table_name, **kwargs)
File "/home/anaconda3/lib/python3.6/site-packages/grpc/_channel.py", line 826, in __call__
return _end_unary_response_blocking(state, call, False, None)
File "/home/anaconda3/lib/python3.6/site-packages/grpc/_channel.py", line 729, in _end_unary_response_blocking
raise _InactiveRpcError(state)
grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
status = StatusCode.UNAVAILABLE
details = "failed to connect to all addresses"
debug_error_string = "{"created":"@1638866326.987521646","description":"Failed to pick subchannel","file":"src/core/ext/filters/client_channel/client_channel.cc","file_line":4133,"referenced_errors":[{"created":"@1638866326.987518864","description":"failed to connect to all addresses","file":"src/core/ext/filters/client_channel/lb_policy/pick_first/pick_first.cc","file_line":397,"grpc_status":14}]}"
>
I just changed runner.py:61 to let ray runtime attach to ray cluster built beforehand. And num_episodes and other resources related parameters were also set up to a small value.
I tried to run this branch on a ray cluster, however got error messages below:
I just changed runner.py:61 to let ray runtime attach to ray cluster built beforehand. And num_episodes and other resources related parameters were also set up to a small value.