[<Ray component: RLlib] Function Parameters in Customizing Models in Ray RLLib

Description

I started using Ray RLLib (version 2.7.1) to solve a RL problem with customized environments and agents. While the envs are properly set, when I started to customize models, I am quite confused and can not proceed on.

1. What does the parameters mean in each of the methods? As is in the official docs, one may customize the PyTorch model like this:

class MyModel(TorchModelV2, nn.Module):
    def __init__(self, obs_space, action_space, num_outputs, model_config, name):
        TorchModelV2.__init__(self, obs_space, action_space,
                              num_outputs, model_config, name)
        nn.Module.__init__(self)

        # __init__ function logic

    def forward(self, input_dict, state, seq_lens):
        # forward function logic

    def value_function(self):
         # value function logic

But the official docs do not tell what does it mean for each of the parameters. Where can I get the detailed explanations?

2. How does the _disable_preprocessor_api act? What confused me most is that the obs_space parameter in the __init__ method comes differently from what I have defined. Also, the input_dict parameter in forward method contains the observation input_dict['obs'] which has the data structure unclear to me. What's more, there seems to be a discrepancy between the default behaviours described in the official docs and the behaviours appear in my case. Let's be specific. My observation space is:

        operation_space = Dict({
            'a': Discrete(13),
            'b': Tuple([
                Dict({
                    'b1': Discrete(3),
                    'b2': Box(low=-10, high=10, shape=(1,), dtype=np.float64),
                    'b3': Discrete(7)
                }) for _ in range(2)
            ]),
            'c': Discrete(7)
        })
        self.observation_space = Tuple([operation_space]*10)

I am writing my config as this:

config = (
    get_trainable_cls('PPO')
    .get_default_config()
    .rl_module(_enable_rl_module_api=False)
    .training(
        model={
            "_disable_preprocessor_api": True,
            "custom_model": "my_model",
            # "custom_model_config": {
            #     "input_files": args.input_files,
            # },
        },
        _enable_learner_api=False
    )
    .environment(RLSearchEnv, env_config=RLSearchEnv_config)
    .framework("torch")
    .rollouts(num_rollout_workers=1)
    .resources(num_gpus=2)
    .experimental(_disable_preprocessor_api=True)
)

There are two _disable_preprocessor_api's in the config, although they are explained same in the official docs, it shows up different behaviours when I set them with different values. Case 1. model={"_disable_preprocessor_api": False, ...} with .experimental(_disable_preprocessor_api=False), which is the default behaviour of RLLib and the default preprocessors are applied. The obs_space is Box(-1.0, 1.0, (420,), float32), which is the one-hot encoded and flattened version of my original definition. I've checked the size match. The input_dict['obs'] preserves the original nested structure of my observation space (i.e. it is a 10-length list), but in each Discrete subspace, it is now the one-hot encoded torch.tensor with additional batch dimension:

>>> input_dict['obs'][0]['a'].shape 
torch.Size([32, 13])

Case 2. model={"_disable_preprocessor_api": True, ...} with .experimental(_disable_preprocessor_api=False) The obs_space is Box(-1.0, 1.0, (420,), float32), same as case 1. The input_dict['obs'] is now a torch.tensor with shape [32, 420]. I guess that it flattens the observation and prepends a batch dimension.

Case 3. model={"_disable_preprocessor_api": False, ...} with .experimental(_disable_preprocessor_api=True) The obs_space now preserves the original nested structure, it is the same as the self.observation_space. The input_dict['obs'] preserves the original nested structure of my observation space, but different from the case 1, it does not one-hot encode the discrete space, but still add a batch dimension:

>>> input_dict['obs'][0]['a'].shape 
torch.Size([32])

Case 4. model={"_disable_preprocessor_api": True, ...} with .experimental(_disable_preprocessor_api=True) The obs_space is the same as the self.observation_space. The input_dict['obs'] is same as the case 3.

They are too complicated to be understood, and I'm unable to continue programming if I do not figure out what does they exactly mean. It would be greatly appreciated if you could tell how to use this parameters. Thanks a lot in advance.

Link

Intro to customize models: https://docs.ray.io/en/latest/rllib/rllib-models.html#custom-models-implementing-your-own-forward-logic "_disable_preprocessor_api" key in model config settings: https://docs.ray.io/en/latest/rllib/rllib-models.html#default-model-config-settings _disable_preprocessor_api parameter in AlgorithmConfig settings: https://docs.ray.io/en/latest/rllib/rllib-training.html#specifying-experimental-features

ray-project / ray

[<Ray component: RLlib] Function Parameters in Customizing Models in Ray RLLib #40955

Description

Link