ray-project / ray

Ray is a unified framework for scaling AI and Python applications. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
https://ray.io
Apache License 2.0
32.19k stars 5.48k forks source link

relic #45448

Open jjgriffin2 opened 2 months ago

jjgriffin2 commented 2 months ago

What happened + What you expected to happen

I am working with a rllib project that using a custom environment that utilizes action masking.

Following training, a checkpoint was created with: algo.save(checkpoint_dir=pickle_dir).

Subsequent attempt to restore using: cwd = os.getcwd() pickle_dir = cwd + f'/ActionMaskingCheckpoint' algo = Algorithm.from_checkpoint(pickle_dir)

failed with the ultimate error:

File ~/anaconda3/lib/python3.11/site-packages/ray/rllib/examples/rl_modules/classes/action_masking_rlm.py:17, in ActionMaskRLMBase.init(self, config) 15 def init(self, config: RLModuleConfig): 16 if not isinstance(config.observation_space, gym.spaces.Dict): ---> 17 raise ValueError( 18 "This model requires the environment to provide a " 19 "gym.spaces.Dict observation space." 20 ) 21 # We need to adjust the observation space for this RL Module so that, when 22 # building the default models, the RLModule does not "see" the action mask but 23 # only the original observation space without the action mask. This tricks it 24 # into building models that are compatible with the original observation space. 25 config.observation_space = config.observation_space["observations"]

ValueError: This model requires the environment to provide a gym.spaces.Dict observation space.

Additionally, earlier in the error stack there are indications that Algorithm is interpreting the import as a multi-agent case, which is not true.

This occurs with either a user-created masked environment or with the ray-provided example, most likely because the observation returned in a masked environment space is a dictionary structured as obs = {"action_mask": action_mask, "observations": original_observation}

Please either fix Algorithm.from_checkpoint to recognize and work with action_masked environments, or provide some guidance as to how one can manually build a method to do so.

Versions / Dependencies

Ray 3.0.0.dev0 Python 3.11 PyTorch 2.2.1 MacOS 14.4.1 on MacBook Pro with M3 Max

Reproduction script

  1. Copy ray.rllib.examples.action_masking.py to local directory
  2. Add the following before ray.shutdown(): cwd = os.getcwd() pickle_dir = cwd + f'/ActionMaskingCheckPoint' if not os.path.exists(pickle_dir): os.makedirs(pickle_dir) print (f'Created {pickle_dir} ...') algo.save(checkpoint_dir=pickle_dir)
  3. Save to local file
  4. Run from CL:

    python

  5. Attempt the following script from ray.rllib.examples.envs.classes.action_mask_env import ActionMaskEnv from ray.rllib.algorithms.algorithm import Algorithm import os cwd = os.getcwd() pickle_dir = cwd + f'/ActionMaskingCheckpoint' algo = Algorithm.from_checkpoint(pickle_dir)

Observe errors

Issue Severity

High: It blocks me from completing my task.

simonsays1980 commented 1 month ago

@jjgriffin2 Thanks for filing this issue and apologies for the trouble. We are moving right now from an old/hybrid (using RLModule and Learner API already) to a new stack and also rewrite the examples. The action masking example is not yet implemented, but on the list to be implemented asap.

jjgriffin2 commented 1 month ago

The issue isn't with the example per se but with the underlying code. I cited the example because it readily demonstrates the error. The error itself shows up whenever you try to recreate an algorithm using

algo = Algorithm.from_checkpoint(pickle_dir)

if the checkpoint was created from an algorithm that (successfully) was using masking. Which means that until the underlying Algorithm.from_checkpoint() is fixed, it can't be restored, and must be retrained every single time it is used. Which is extraordinarily time consuming.