[ray] Knapsack environment throws _validate_env error in Rllib 1.9.1

PhilippWillms commented 2 years ago

Following the tutorial in Action Masking with rllib with Ray Rllib 1.9.1 throws EnvError when instantiating Rllib trainer (cf. full error trace below).

Finding Ray Rllib does interval validation of correcly functioning environment and in the error trace below, it is said the arbitraty initial state set by reset() method does not fit into definition of observation space. The issue only occurred in case of action masking used.

Solution After debugging the __validateenv method with the super class KnapsackEnv (and hereby 'KnapsackEnv-v0') and the env config below revelas that datatype int16 in the state definition causes an issue. Giving more memory by setting to int32 let the environment config pass the validation checks.

env_config = {'N': 5, 'max_weight': 15, 'item_weights': np.array([1, 12, 2, 1, 4]), 'item_values': np.array([2, 4, 2, 1, 10]), 'mask': True}

RayActorError: The actor died because of an error raised in its creation task, ray::RolloutWorker.__init__() (pid=1464, ip=127.0.0.1) File "python\ray\_raylet.pyx", line 625, in ray._raylet.execute_task File "python\ray\_raylet.pyx", line 629, in ray._raylet.execute_task File "python\ray\_raylet.pyx", line 578, in ray._raylet.execute_task.function_executor File "C:\Users\phili\.conda\envs\pip-rayrllib\lib\site-packages\ray\_private\function_manager.py", line 609, in actor_method_executor return method(__ray_actor, *args, **kwargs) File "C:\Users\phili\.conda\envs\pip-rayrllib\lib\site-packages\ray\util\tracing\tracing_helper.py", line 451, in _resume_span return method(self, *_args, **_kwargs) File "C:\Users\phili\.conda\envs\pip-rayrllib\lib\site-packages\ray\rllib\evaluation\rollout_worker.py", line 463, in __init__ _validate_env(self.env, env_context=self.env_context) File "C:\Users\phili\.conda\envs\pip-rayrllib\lib\site-packages\ray\rllib\evaluation\rollout_worker.py", line 1700, in _validate_env raise EnvError( ray.rllib.utils.error.EnvError: Env'sobservation_spaceDict(action_mask:Box([0 0 0 0 0], [1 1 1 1 1], (5,), int32), avail_actions:Box([0 0 0 0 0], [1 1 1 1 1], (5,), int16), state:Box([0 0 0 0 0 0 0 0 0 0 0], [15 15 15 15 15 15 15 15 15 15 15], (11,), int16)) does not contain returned observation after a reset ({'action_mask': array([1, 1, 1, 1, 1]), 'avail_actions': array([1, 1, 1, 1, 1], dtype=int16), 'state': array([ 1, 12, 2, 1, 4, 2, 4, 2, 1, 10, 0])})!

mave5 commented 2 years ago

@PhilippWillms I get the same error! Can you elaborate on your solution?

PhilippWillms commented 2 years ago

It is rather simple, see below an example from knapsack.py. At point of initializiation, use int32 instead of int16 for observation space and action mask. I am happy to contribute via pull request ...

obs_space = spaces.Box(
            0, self.max_weight, shape=(2*self.N + 1,), dtype=np.int32)
self.action_space = spaces.Discrete(self.N)
if self.mask:
            self.observation_space = spaces.Dict({
                "action_mask": spaces.Box(0, 1, shape=(self.N,), dtype=np.int32),
                "avail_actions": spaces.Box(0, 1, shape=(self.N,), dtype=np.int16),
                "state": obs_space
                })

osarwar commented 2 years ago

Thanks a bunch for your PR! It's been merged so I'll close this issue.

hubbs5 / or-gym

[ray] Knapsack environment throws _validate_env error in Rllib 1.9.1 #17