Open chwflhs opened 2 years ago
Hey @chwflhs , thanks for raising this issue. I don't think this is a bug, albeit, we should improve our error message.
I'm actually getting the same error with the 1D obs space (shape=(42,)
).
The reason you are seeing these space mismatches is that your environment remains to be of obs-space=Box(shape=(4,))
due to CartPole's original obs space.
Here is what works for me:
trainer = A2CTrainer(env="env", config={
"multiagent": {
"policies": {
# low=-1 is important (CartPole produces negative values as well)
# shape=(4,) is important as CartPole's observation space is of that shape.
'0': (A3CTFPolicy, spaces.Box(low=-1., high=1., shape=(4, )), spaces.Discrete(2), {}), #change shape to (42, ) avoids the bug, (however, may produce a new bug since the observation space is not compatible with the environment, but is not related to this case)
},
"policy_mapping_fn": lambda id, **kwargs: '0' #unique policy
},
})
You can also just do:
from ray.rllib.policy.policy import PolicySpec
trainer = A2CTrainer(env="env", config={
"multiagent": {
"policies": {
'0': PolicySpec(policy_class=A3CTFPolicy),
},
...
},
})
this way, RLlib will automatically infer obs- and action-spaces from the given env.
Thanks a lot for your reply^^. I'll follow your suggestion to see whether it could be solved on my program. Since I am still not familiar with the Rllib system, I'll further investigate how it works in a customized environment.
Search before asking
Ray Component
RLlib
What happened + What you expected to happen
Note:
(1). if you change shape=(42, 42, 3) to 1D shape=(42, ) the bug could be avoided. Besides, not only A3CTFPolicy, other policies including DQN or PG have similar problems. (2). The bug is also irrelevant to ANY 3D shape size, e.g., shape=(84, 84, 3) or any customized 3D shape size. (3). The older version of ray may not have this bug. My version is ray 1.9.2.
expected behavior: the workers are initialized and the training begins.
Logs in the console: runfile('E:/Program/python/projects/RL_signal/A3C - fortest.py', wdir='E:/Program/python/projects/RL_signal') pid=3540) 2022-02-01 22:04:11,775 WARNING deprecation.py:45 -- DeprecationWarning:
SampleBatch['is_training']
has been deprecated. UseSampleBatch.is_training
instead. This will raise an error in the future! Traceback (most recent call last):File "E:\Program\python\projects\RL_signal\A3C - fortest.py", line 12, in
trainer = A3CTrainer(env="env", config={
File "C:\Users\y\anaconda3\envs\tensorflow\lib\site-packages\ray\rllib\agents\trainer_template.py", line 102, in init Trainer.init(self, config, env, logger_creator,
File "C:\Users\y\anaconda3\envs\tensorflow\lib\site-packages\ray\rllib\agents\trainer.py", line 661, in init super().init(config, logger_creator, remote_checkpoint_dir,
File "C:\Users\y\anaconda3\envs\tensorflow\lib\site-packages\ray\tune\trainable.py", line 121, in init self.setup(copy.deepcopy(self.config))
File "C:\Users\y\anaconda3\envs\tensorflow\lib\site-packages\ray\rllib\agents\trainer_template.py", line 113, in setup super().setup(config)
File "C:\Users\y\anaconda3\envs\tensorflow\lib\site-packages\ray\rllib\agents\trainer.py", line 764, in setup self._init(self.config, self.env_creator)
File "C:\Users\y\anaconda3\envs\tensorflow\lib\site-packages\ray\rllib\agents\trainer_template.py", line 136, in _init self.workers = self._make_workers(
File "C:\Users\y\anaconda3\envs\tensorflow\lib\site-packages\ray\rllib\agents\trainer.py", line 1727, in _make_workers return WorkerSet(
File "C:\Users\y\anaconda3\envs\tensorflow\lib\site-packages\ray\rllib\evaluation\worker_set.py", line 87, in init remote_spaces = ray.get(self.remote_workers(
File "C:\Users\y\anaconda3\envs\tensorflow\lib\site-packages\ray_private\client_mode_hook.py", line 105, in wrapper return func(*args, **kwargs)
File "C:\Users\y\anaconda3\envs\tensorflow\lib\site-packages\ray\worker.py", line 1715, in get raise value
RayActorError: The actor died unexpectedly before finishing this task.
2022-02-01 22:04:12,730 WARNING worker.py:1245 -- A worker died or was killed while executing a task by an unexpected system error. To troubleshoot the problem, check the logs for the dead worker. RayTask ID: ffffffffffffffffb54fdcdb24655823de9fa76101000000 Worker ID: 59ceffa4c59db8073b5d068f10f50522923010a6c49660286d36dcc8 Node ID: 8d1e2459a5c6d1e8afe93e09d3c379e92ba31c91afd92dbf14979b73 Worker IP address: 127.0.0.1 Worker port: 36119 Worker PID: 3540 2022-02-01 22:04:12,764 WARNING worker.py:1245 -- A worker died or was killed while executing a task by an unexpected system error. To troubleshoot the problem, check the logs for the dead worker. RayTask ID: ffffffffffffffff6feb129264a65a2642feb52701000000 Worker ID: 5d9bf6c176f9eb76c37441daae9417bb49bb1ecb88e5793f10de96c1 Node ID: 8d1e2459a5c6d1e8afe93e09d3c379e92ba31c91afd92dbf14979b73 Worker IP address: 127.0.0.1 Worker port: 36097 Worker PID: 14792
Versions / Dependencies
ray 1.9.2 tensorflow 2.x gym 0.21.0
spyder 5.05 python 3.8 os win10
Reproduction script
Anything else
Consistently reproduced with the above script
Are you willing to submit a PR?