[RLLIB] Error in executing StatelessCartPole environment with AlphaZero

What happened + What you expected to happen

I tried to build a model with alpha zero algorithm on the StatelessCartpole environment but getting an error.

The error message received states

2023-09-26 14:02:50,248 ERROR actor_manager.py:500 -- Ray error, taking actor 1 out of service. The actor died because of an error raised in its creation task, ray::RolloutWorker.__init__() (pid=53437, ip=127.0.0.1, actor_id=038688043032d4f461cfd39b01000000, repr=<ray.rllib.evaluation.rollout_worker.RolloutWorker object at 0x1071c5e90>)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/....../lib/python3.11/site-packages/ray/rllib/evaluation/rollout_worker.py", line 525, in __init__
    self._update_policy_map(policy_dict=self.policy_dict)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/....../lib/python3.11/site-packages/ray/rllib/evaluation/rollout_worker.py", line 1727, in _update_policy_map
    self._build_policy_map(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/....../lib/python3.11/site-packages/ray/rllib/evaluation/rollout_worker.py", line 1838, in _build_policy_map
    new_policy = create_policy_for_framework(
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/....../lib/python3.11/site-packages/ray/rllib/utils/policy.py", line 142, in create_policy_for_framework
    return policy_class(observation_space, action_space, merged_config)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/....../lib/python3.11/site-packages/ray/rllib/algorithms/alpha_zero/alpha_zero.py", line 327, in __init__
    super().__init__(
  File "/....../lib/python3.11/site-packages/ray/rllib/algorithms/alpha_zero/alpha_zero_policy.py", line 38, in __init__
    self.env = self.env_creator()
               ^^^^^^^^^^^^^^^^^^
  File "/....../lib/python3.11/site-packages/ray/rllib/algorithms/alpha_zero/alpha_zero.py", line 317, in _env_creator
    return env_cls(config["env_config"])
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/....../lib/python3.11/site-packages/ray/rllib/algorithms/alpha_zero/ranked_rewards.py", line 43, in __init__
    self._initialize_buffer(r2_config["num_init_rewards"])
  File "/....../lib/python3.11/site-packages/ray/rllib/algorithms/alpha_zero/ranked_rewards.py", line 51, in _initialize_buffer
    mask = obs["action_mask"]
           ~~~^^^^^^^^^^^^^^^
IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices
2023-09-26 14:02:50,250 ERROR actor_manager.py:500 -- Ray error, taking actor 2 out of service. The actor died because of an error raised in its creation task, ray::RolloutWorker.__init__() (pid=53438, ip=127.0.0.1, actor_id=2cd51951020ca32eccefe81d01000000, repr=<ray.rllib.evaluation.rollout_worker.RolloutWorker object at 0x111775bd0>)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/....../lib/python3.11/site-packages/ray/rllib/evaluation/rollout_worker.py", line 525, in __init__
    self._update_policy_map(policy_dict=self.policy_dict)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/....../lib/python3.11/site-packages/ray/rllib/evaluation/rollout_worker.py", line 1727, in _update_policy_map
    self._build_policy_map(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/....../lib/python3.11/site-packages/ray/rllib/evaluation/rollout_worker.py", line 1838, in _build_policy_map
    new_policy = create_policy_for_framework(
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/....../lib/python3.11/site-packages/ray/rllib/utils/policy.py", line 142, in create_policy_for_framework
    return policy_class(observation_space, action_space, merged_config)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/....../lib/python3.11/site-packages/ray/rllib/algorithms/alpha_zero/alpha_zero.py", line 327, in __init__
    super().__init__(
  File "/....../lib/python3.11/site-packages/ray/rllib/algorithms/alpha_zero/alpha_zero_policy.py", line 38, in __init__
    self.env = self.env_creator()
               ^^^^^^^^^^^^^^^^^^
  File "/....../lib/python3.11/site-packages/ray/rllib/algorithms/alpha_zero/alpha_zero.py", line 317, in _env_creator
    return env_cls(config["env_config"])
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/....../lib/python3.11/site-packages/ray/rllib/algorithms/alpha_zero/ranked_rewards.py", line 43, in __init__
    self._initialize_buffer(r2_config["num_init_rewards"])
  File "/....../lib/python3.11/site-packages/ray/rllib/algorithms/alpha_zero/ranked_rewards.py", line 51, in _initialize_buffer
    mask = obs["action_mask"]
           ~~~^^^^^^^^^^^^^^^
IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices
Traceback (most recent call last):
  File "/....../lib/python3.11/site-packages/ray/rllib/evaluation/worker_set.py", line 157, in __init__
    self._setup(
  File "/....../lib/python3.11/site-packages/ray/rllib/evaluation/worker_set.py", line 227, in _setup
    self.add_workers(
  File "/....../lib/python3.11/site-packages/ray/rllib/evaluation/worker_set.py", line 593, in add_workers
    raise result.get()
  File "/....../lib/python3.11/site-packages/ray/rllib/utils/actor_manager.py", line 481, in __fetch_result
    result = ray.get(r)
             ^^^^^^^^^^
  File "/....../lib/python3.11/site-packages/ray/_private/auto_init_hook.py", line 24, in auto_init_wrapper
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File "/....../lib/python3.11/site-packages/ray/_private/client_mode_hook.py", line 103, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/....../lib/python3.11/site-packages/ray/_private/worker.py", line 2526, in get
    raise value
ray.exceptions.RayActorError: The actor died because of an error raised in its creation task, ray::RolloutWorker.__init__() (pid=53437, ip=127.0.0.1, actor_id=038688043032d4f461cfd39b01000000, repr=<ray.rllib.evaluation.rollout_worker.RolloutWorker object at 0x1071c5e90>)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/....../lib/python3.11/site-packages/ray/rllib/evaluation/rollout_worker.py", line 525, in __init__
    self._update_policy_map(policy_dict=self.policy_dict)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/....../lib/python3.11/site-packages/ray/rllib/evaluation/rollout_worker.py", line 1727, in _update_policy_map
    self._build_policy_map(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/....../lib/python3.11/site-packages/ray/rllib/evaluation/rollout_worker.py", line 1838, in _build_policy_map
    new_policy = create_policy_for_framework(
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/....../lib/python3.11/site-packages/ray/rllib/utils/policy.py", line 142, in create_policy_for_framework
    return policy_class(observation_space, action_space, merged_config)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/....../lib/python3.11/site-packages/ray/rllib/algorithms/alpha_zero/alpha_zero.py", line 327, in __init__
    super().__init__(
  File "/....../lib/python3.11/site-packages/ray/rllib/algorithms/alpha_zero/alpha_zero_policy.py", line 38, in __init__
    self.env = self.env_creator()
               ^^^^^^^^^^^^^^^^^^
  File "/....../lib/python3.11/site-packages/ray/rllib/algorithms/alpha_zero/alpha_zero.py", line 317, in _env_creator
    return env_cls(config["env_config"])
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/....../lib/python3.11/site-packages/ray/rllib/algorithms/alpha_zero/ranked_rewards.py", line 43, in __init__
    self._initialize_buffer(r2_config["num_init_rewards"])
  File "/....../lib/python3.11/site-packages/ray/rllib/algorithms/alpha_zero/ranked_rewards.py", line 51, in _initialize_buffer
    mask = obs["action_mask"]
           ~~~^^^^^^^^^^^^^^^
IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices
 
During handling of the above exception, another exception occurred:
 
Traceback (most recent call last):
  File "/Users/paula/Desktop/Projects/RL Practice/RLLIB_Practice4/stateless_cartpole_alphazero.py", line 21, in <module>
    a = AlphaZero(**nn_kwargs)
        ^^^^^^^^^^^^^^^^^^^^^^
  File "/....../lib/python3.11/site-packages/ray/rllib/utils/deprecation.py", line 106, in patched_init
    return obj_init(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/....../lib/python3.11/site-packages/ray/rllib/algorithms/algorithm.py", line 517, in __init__
    super().__init__(
  File "/....../lib/python3.11/site-packages/ray/tune/trainable/trainable.py", line 169, in __init__
    self.setup(copy.deepcopy(self.config))
  File "/....../lib/python3.11/site-packages/ray/rllib/algorithms/algorithm.py", line 639, in setup
    self.workers = WorkerSet(
                   ^^^^^^^^^^
  File "/....../lib/python3.11/site-packages/ray/rllib/evaluation/worker_set.py", line 179, in __init__
    raise e.args[0].args[2]
IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices
(RolloutWorker pid=53438) Exception raised in creation task: The actor died because of an error raised in its creation task, ray::RolloutWorker.__init__() (pid=53438, ip=127.0.0.1, actor_id=2cd51951020ca32eccefe81d01000000, repr=<ray.rllib.evaluation.rollout_worker.RolloutWorker object at 0x111775bd0>)
(RolloutWorker pid=53438)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(RolloutWorker pid=53438)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(RolloutWorker pid=53438)   File "/....../lib/python3.11/site-packages/ray/rllib/evaluation/rollout_worker.py", line 525, in __init__
(RolloutWorker pid=53438)     self._update_policy_map(policy_dict=self.policy_dict)
(RolloutWorker pid=53438)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(RolloutWorker pid=53438)   File "/....../lib/python3.11/site-packages/ray/rllib/evaluation/rollout_worker.py", line 1727, in _update_policy_map
(RolloutWorker pid=53438)     self._build_policy_map(
(RolloutWorker pid=53438)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(RolloutWorker pid=53438)   File "/....../lib/python3.11/site-packages/ray/rllib/evaluation/rollout_worker.py", line 1838, in _build_policy_map
(RolloutWorker pid=53438)     new_policy = create_policy_for_framework(
(RolloutWorker pid=53438)                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(RolloutWorker pid=53438)   File "/....../lib/python3.11/site-packages/ray/rllib/utils/policy.py", line 142, in create_policy_for_framework
(RolloutWorker pid=53438)     return policy_class(observation_space, action_space, merged_config)
(RolloutWorker pid=53438)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(RolloutWorker pid=53438)   File "/....../lib/python3.11/site-packages/ray/rllib/algorithms/alpha_zero/alpha_zero.py", line 327, in __init__
(RolloutWorker pid=53438)     super().__init__(
(RolloutWorker pid=53438)   File "/....../lib/python3.11/site-packages/ray/rllib/algorithms/alpha_zero/alpha_zero_policy.py", line 38, in __init__
(RolloutWorker pid=53438)     self.env = self.env_creator()
(RolloutWorker pid=53438)                ^^^^^^^^^^^^^^^^^^
(RolloutWorker pid=53438)   File "/....../lib/python3.11/site-packages/ray/rllib/algorithms/alpha_zero/alpha_zero.py", line 317, in _env_creator
(RolloutWorker pid=53438)     return env_cls(config["env_config"])
(RolloutWorker pid=53438)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(RolloutWorker pid=53438)   File "/....../lib/python3.11/site-packages/ray/rllib/algorithms/alpha_zero/ranked_rewards.py", line 43, in __init__
(RolloutWorker pid=53438)     self._initialize_buffer(r2_config["num_init_rewards"])
(RolloutWorker pid=53438)   File "/....../lib/python3.11/site-packages/ray/rllib/algorithms/alpha_zero/ranked_rewards.py", line 51, in _initialize_buffer
(RolloutWorker pid=53438)     mask = obs["action_mask"]
(RolloutWorker pid=53438)            ~~~^^^^^^^^^^^^^^^
(RolloutWorker pid=53438) IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices
(RolloutWorker pid=53437) 2023-09-26 14:02:50,210       WARNING deprecation.py:50 -- DeprecationWarning: `TorchPolicy` has been deprecated. This will raise an error in the future!
(RolloutWorker pid=53437) 2023-09-26 14:02:50,211       WARNING deprecation.py:50 -- DeprecationWarning: `StochasticSampling` has been deprecated. This will raise an error in the future!
(RolloutWorker pid=53437) 2023-09-26 14:02:50,211       WARNING deprecation.py:50 -- DeprecationWarning: `Exploration` has been deprecated. This will raise an error in the future!
(RolloutWorker pid=53437) 2023-09-26 14:02:50,211       WARNING deprecation.py:50 -- DeprecationWarning: `Random` has been deprecated. This will raise an error in the future!
(pid=53437) DeprecationWarning: `DirectStepOptimizer` has been deprecated. This will raise an error in the future!
(RolloutWorker pid=53437) Exception raised in creation task: The actor died because of an error raised in its creation task, ray::RolloutWorker.__init__() (pid=53437, ip=127.0.0.1, actor_id=038688043032d4f461cfd39b01000000, repr=<ray.rllib.evaluation.rollout_worker.RolloutWorker object at 0x1071c5e90>)
(RolloutWorker pid=53437)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(RolloutWorker pid=53437)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [repeated 3x across cluster] (Ray deduplicates logs by default. Set RAY_DEDUP_LOGS=0 to disable log deduplication, or see https://docs.ray.io/en/master/ray-observability/ray-logging.html#log-deduplication for more options.)
(RolloutWorker pid=53437)   File "/....../lib/python3.11/site-packages/ray/rllib/algorithms/alpha_zero/ranked_rewards.py", line 43, in __init__ [repeated 4x across cluster]
(RolloutWorker pid=53437)     self._update_policy_map(policy_dict=self.policy_dict)
(RolloutWorker pid=53437)   File "/....../lib/python3.11/site-packages/ray/rllib/evaluation/rollout_worker.py", line 1727, in _update_policy_map
(RolloutWorker pid=53437)     self._build_policy_map(
(RolloutWorker pid=53437)   File "/....../lib/python3.11/site-packages/ray/rllib/evaluation/rollout_worker.py", line 1838, in _build_policy_map
(RolloutWorker pid=53437)     new_policy = create_policy_for_framework(
(RolloutWorker pid=53437)                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(RolloutWorker pid=53437)   File "/....../lib/python3.11/site-packages/ray/rllib/utils/policy.py", line 142, in create_policy_for_framework
(RolloutWorker pid=53437)     return policy_class(observation_space, action_space, merged_config)
(RolloutWorker pid=53437)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(RolloutWorker pid=53437)     super().__init__(
(RolloutWorker pid=53437)     self.env = self.env_creator()
(RolloutWorker pid=53437)                ^^^^^^^^^^^^^^^^^^
(RolloutWorker pid=53437)   File "/....../lib/python3.11/site-packages/ray/rllib/algorithms/alpha_zero/alpha_zero.py", line 317, in _env_creator
(RolloutWorker pid=53437)     return env_cls(config["env_config"])
(RolloutWorker pid=53437)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(RolloutWorker pid=53437)     self._initialize_buffer(r2_config["num_init_rewards"])
(RolloutWorker pid=53437)   File "/....../lib/python3.11/site-packages/ray/rllib/algorithms/alpha_zero/ranked_rewards.py", line 51, in _initialize_buffer
(RolloutWorker pid=53437)     mask = obs["action_mask"]
(RolloutWorker pid=53437)            ~~~^^^^^^^^^^^^^^^
(RolloutWorker pid=53437) IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices

Versions / Dependencies

ray[rllib] : 2.6.3 os : mac os x

Reproduction script

Reproduction Script

from ray.rllib.examples.env.stateless_cartpole import StatelessCartPole
from ray.rllib.algorithms.alpha_zero import AlphaZero

nn_config= {
        # config to pass to env class
        # "env_config": env_config,
        #neural network config
        "lr": 0.003,
        # "model": model_dict,
        "gamma": 0.95,
        "train_batch_size":20_000,
        "num_rollout_worker":1,
        "training": {"_enable_learner_api": False},
        "rl_module": {'_enable_rl_module_api':False},
    }

nn_kwargs = {"env": StatelessCartPole,
            "config": nn_config
                    }

a = AlphaZero(**nn_kwargs)

print(a)

Issue Severity

High: It blocks me from completing my task.

ray-project / ray

[RLLIB] Error in executing StatelessCartPole environment with AlphaZero #39862

What happened + What you expected to happen

Versions / Dependencies

Reproduction script

Issue Severity