chwflhs commented 2 years ago

Search before asking

[X] I searched the issues and found no similar issues.

Ray Component

RLlib

What happened + What you expected to happen

bug description: This bug is irrelevant to ANY environment, or, you could reproduce it with an arbitrary environment. When trying to transfer a 3D shape Box space to the agent, the bug occurs (the program is crashed and given the hint message that “A worker died or was killed while executing a task by an unexpected system error, and the RayActorError: The actor died unexpectedly before finishing this task.” ). However, if the input space is a 1D space, the bug could be avoided. The core code has only 8 lines, given below:

   trainer = A3CTrainer(env="env", config={
           "multiagent": {
                 "policies": {
                      '0': (A3CTFPolicy, spaces.Box(low=0., high=1., **shape=(42, 42, 3))**, spaces.Discrete(2), {}),  
                  },
                 "policy_mapping_fn": lambda id: '0'  #unique policy
             },
         })

Note:

(1). if you change shape=(42, 42, 3) to 1D shape=(42, ) the bug could be avoided. Besides, not only A3CTFPolicy, other policies including DQN or PG have similar problems. (2). The bug is also irrelevant to ANY 3D shape size, e.g., shape=(84, 84, 3) or any customized 3D shape size. (3). The older version of ray may not have this bug. My version is ray 1.9.2.

expected behavior: the workers are initialized and the training begins.
Logs in the console: runfile('E:/Program/python/projects/RL_signal/A3C - fortest.py', wdir='E:/Program/python/projects/RL_signal') pid=3540) 2022-02-01 22:04:11,775 WARNING deprecation.py:45 -- DeprecationWarning: SampleBatch['is_training'] has been deprecated. Use SampleBatch.is_training instead. This will raise an error in the future! Traceback (most recent call last):

File "E:\Program\python\projects\RL_signal\A3C - fortest.py", line 12, in trainer = A3CTrainer(env="env", config={

File "C:\Users\y\anaconda3\envs\tensorflow\lib\site-packages\ray\rllib\agents\trainer_template.py", line 102, in init Trainer.init(self, config, env, logger_creator,

File "C:\Users\y\anaconda3\envs\tensorflow\lib\site-packages\ray\rllib\agents\trainer.py", line 661, in init super().init(config, logger_creator, remote_checkpoint_dir,

File "C:\Users\y\anaconda3\envs\tensorflow\lib\site-packages\ray\tune\trainable.py", line 121, in init self.setup(copy.deepcopy(self.config))

File "C:\Users\y\anaconda3\envs\tensorflow\lib\site-packages\ray\rllib\agents\trainer_template.py", line 113, in setup super().setup(config)

File "C:\Users\y\anaconda3\envs\tensorflow\lib\site-packages\ray\rllib\agents\trainer.py", line 764, in setup self._init(self.config, self.env_creator)

File "C:\Users\y\anaconda3\envs\tensorflow\lib\site-packages\ray\rllib\agents\trainer_template.py", line 136, in _init self.workers = self._make_workers(

File "C:\Users\y\anaconda3\envs\tensorflow\lib\site-packages\ray\rllib\agents\trainer.py", line 1727, in _make_workers return WorkerSet(

File "C:\Users\y\anaconda3\envs\tensorflow\lib\site-packages\ray\rllib\evaluation\worker_set.py", line 87, in init remote_spaces = ray.get(self.remote_workers(

File "C:\Users\y\anaconda3\envs\tensorflow\lib\site-packages\ray_private\client_mode_hook.py", line 105, in wrapper return func(*args, **kwargs)

File "C:\Users\y\anaconda3\envs\tensorflow\lib\site-packages\ray\worker.py", line 1715, in get raise value

RayActorError: The actor died unexpectedly before finishing this task.

2022-02-01 22:04:12,730 WARNING worker.py:1245 -- A worker died or was killed while executing a task by an unexpected system error. To troubleshoot the problem, check the logs for the dead worker. RayTask ID: ffffffffffffffffb54fdcdb24655823de9fa76101000000 Worker ID: 59ceffa4c59db8073b5d068f10f50522923010a6c49660286d36dcc8 Node ID: 8d1e2459a5c6d1e8afe93e09d3c379e92ba31c91afd92dbf14979b73 Worker IP address: 127.0.0.1 Worker port: 36119 Worker PID: 3540 2022-02-01 22:04:12,764 WARNING worker.py:1245 -- A worker died or was killed while executing a task by an unexpected system error. To troubleshoot the problem, check the logs for the dead worker. RayTask ID: ffffffffffffffff6feb129264a65a2642feb52701000000 Worker ID: 5d9bf6c176f9eb76c37441daae9417bb49bb1ecb88e5793f10de96c1 Node ID: 8d1e2459a5c6d1e8afe93e09d3c379e92ba31c91afd92dbf14979b73 Worker IP address: 127.0.0.1 Worker port: 36097 Worker PID: 14792

Versions / Dependencies

ray 1.9.2 tensorflow 2.x gym 0.21.0

spyder 5.05 python 3.8 os win10

Reproduction script

import ray
from ray.rllib.agents.a3c.a3c import A3CTrainer,A3CTFPolicy
from ray.tune.registry import register_env
from gym import spaces
from ray.rllib.examples.env.multi_agent import MultiAgentCartPole

if __name__ == "__main__":
    # Register the model and environment, any env does not affect reproduction 
    register_env("env", lambda _: MultiAgentCartPole({"num_agents": 4}))
    ray.init()

    trainer = A3CTrainer(env="env", config={
        "multiagent": {
            "policies": {
                '0': (A3CTFPolicy, spaces.Box(low=0., high=1., shape=(42, 42, 3)), spaces.Discrete(2), {}),  #shape=(42, 42, 3) cause the bug
                #'0': (A3CTFPolicy, spaces.Box(low=0., high=1., shape=(42, )), spaces.Discrete(2), {}),  #change shape to (42, ) avoids the bug, (however, may produce a new bug since the observation space is not compatible with the environment, but is not related to this case)
            },
            "policy_mapping_fn": lambda id: '0'  #unique policy
        },
    })

    for i in range(5000):
        trainer.train()

Anything else

Consistently reproduced with the above script

Are you willing to submit a PR?

[ ] Yes I am willing to submit a PR!

sven1977 commented 2 years ago

Hey @chwflhs , thanks for raising this issue. I don't think this is a bug, albeit, we should improve our error message. I'm actually getting the same error with the 1D obs space (shape=(42,)).

The reason you are seeing these space mismatches is that your environment remains to be of obs-space=Box(shape=(4,)) due to CartPole's original obs space.

Here is what works for me:

    trainer = A2CTrainer(env="env", config={
        "multiagent": {
            "policies": {
                # low=-1 is important (CartPole produces negative values as well)
                # shape=(4,) is important as CartPole's observation space is of that shape.
                '0': (A3CTFPolicy, spaces.Box(low=-1., high=1., shape=(4, )), spaces.Discrete(2), {}),  #change shape to (42, ) avoids the bug, (however, may produce a new bug since the observation space is not compatible with the environment, but is not related to this case)
            },
            "policy_mapping_fn": lambda id, **kwargs: '0'  #unique policy
        },
    })

You can also just do:

    from ray.rllib.policy.policy import PolicySpec
    trainer = A2CTrainer(env="env", config={
        "multiagent": {
            "policies": {
                '0': PolicySpec(policy_class=A3CTFPolicy),
            },
            ...
        },
    })

this way, RLlib will automatically infer obs- and action-spaces from the given env.