Open allendred opened 2 years ago
From the stack trace, it seems some large object is passed when remote worker is created - causing grpc resource exhausted error. This is consistent with your observation that this only happens when "config["num_workers"] not 0".
Do you have a env_creator that I can just plug in and run? The current one complains about ray_env
.
From the stack trace, it seems some large object is passed when remote worker is created - causing grpc resource exhausted error. This is consistent with your observation that this only happens when "config["num_workers"] not 0".
Do you have a env_creator that I can just plug in and run? The current one complains about
ray_env
. Sorry about that,customray_env
is confidential I tried the example and it works. I'm on another version ofray==1.6.0
, on another server with no issues, but on this server tried the following error File "/home/gnn/conda/envs/gnn/lib/python3.6/site-packages/ray/rllib/models/preprocessors.py", line 187, in transform self.check_shape(observation) File "/home/gnn/conda/envs/gnn/lib/python3.6/site-packages/ray/rllib/models/preprocessors.py", line 68, in check_shape observation, self._obs_space) ValueError: ('Observation ({}) outside given space ({})!', array([0.]), Box([0.], [20.], (1,), float32))
What happened + What you expected to happen
At Reproduction script, if
config["num_workers"]
not 0, error will appearTraceback (most recent call last): File "ray_test.py", line 135, in
trainer = sac.SACTrainer(config=config, env="my_env-v0")
File "/home/gnn/conda/envs/gnn/lib/python3.6/site-packages/ray/rllib/agents/sac/sac.py", line 192, in init
super().init(*args, kwargs)
File "/home/gnn/conda/envs/gnn/lib/python3.6/site-packages/ray/rllib/agents/trainer.py", line 831, in init
config, logger_creator, remote_checkpoint_dir, sync_function_tpl
File "/home/gnn/conda/envs/gnn/lib/python3.6/site-packages/ray/tune/trainable.py", line 149, in init
self.setup(copy.deepcopy(self.config))
File "/home/gnn/conda/envs/gnn/lib/python3.6/site-packages/ray/rllib/agents/trainer.py", line 918, in setup
logdir=self.logdir,
File "/home/gnn/conda/envs/gnn/lib/python3.6/site-packages/ray/rllib/evaluation/worker_set.py", line 119, in init
self.add_workers(num_workers)
File "/home/gnn/conda/envs/gnn/lib/python3.6/site-packages/ray/rllib/evaluation/worker_set.py", line 242, in add_workers
for i in range(num_workers)
File "/home/gnn/conda/envs/gnn/lib/python3.6/site-packages/ray/rllib/evaluation/worker_set.py", line 242, in
for i in range(num_workers)
File "/home/gnn/conda/envs/gnn/lib/python3.6/site-packages/ray/rllib/evaluation/worker_set.py", line 608, in _make_worker
disable_env_checking=config["disable_env_checking"],
File "/home/gnn/conda/envs/gnn/lib/python3.6/site-packages/ray/actor.py", line 540, in remote
return self._remote(args=args, kwargs=kwargs)
File "/home/gnn/conda/envs/gnn/lib/python3.6/site-packages/ray/util/tracing/tracing_helper.py", line 383, in _invocation_actor_class_remote_span
return method(self, args, kwargs, *_args, *_kwargs)
File "/home/gnn/conda/envs/gnn/lib/python3.6/site-packages/ray/actor.py", line 743, in _remote
if client_mode_should_convert(auto_init=True):
File "/home/gnn/conda/envs/gnn/lib/python3.6/site-packages/ray/_private/client_mode_hook.py", line 124, in client_mode_should_convert
ray.init()
File "/home/gnn/conda/envs/gnn/lib/python3.6/site-packages/ray/_private/client_mode_hook.py", line 105, in wrapper
return func(args, kwargs)
File "/home/gnn/conda/envs/gnn/lib/python3.6/site-packages/ray/worker.py", line 1100, in init
hook()
File "/home/gnn/conda/envs/gnn/lib/python3.6/site-packages/ray/tune/registry.py", line 191, in flush_values
_make_key(self._prefix, category, key), value, overwrite=True
File "/home/gnn/conda/envs/gnn/lib/python3.6/site-packages/ray/_private/client_mode_hook.py", line 105, in wrapper
return func(*args, *kwargs)
File "/home/gnn/conda/envs/gnn/lib/python3.6/site-packages/ray/experimental/internal_kv.py", line 88, in _internal_kv_put
return global_gcs_client.internal_kv_put(key, value, overwrite, namespace) == 0
File "/home/gnn/conda/envs/gnn/lib/python3.6/site-packages/ray/_private/gcs_utils.py", line 104, in wrapper
return f(self, args, **kwargs)
File "/home/gnn/conda/envs/gnn/lib/python3.6/site-packages/ray/_private/gcs_utils.py", line 195, in internal_kv_put
reply = self._kv_stub.InternalKVPut(req)
File "/home/gnn/conda/envs/gnn/lib/python3.6/site-packages/grpc/_channel.py", line 946, in call
return _end_unary_response_blocking(state, call, False, None)
File "/home/gnn/conda/envs/gnn/lib/python3.6/site-packages/grpc/_channel.py", line 849, in _end_unary_response_blocking
raise _InactiveRpcError(state)
grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
status = StatusCode.RESOURCE_EXHAUSTED
details = "Received message larger than max (105683136 vs. 104857600)"
debug_error_string = "{"created":"@1651138858.283821451","description":"Error received from peer ipv4:192.168.83.225:49167","file":"src/core/lib/surface/call.cc","file_line":1074,"grpc_message":"Received message larger than max (105683136 vs. 104857600)","grpc_status":8}"
Versions / Dependencies
ray==1.12.0 ray[rllib]
Reproduction script
Issue Severity
High: It blocks me from completing my task.