I started using Ray RLLib (version 2.7.1) to solve a RL problem with customized environments and agents. While the envs are properly set, when I started to customize models, I am quite confused and can not proceed on.
1. What does the parameters mean in each of the methods?
As is in the official docs, one may customize the PyTorch model like this:
class MyModel(TorchModelV2, nn.Module):
def __init__(self, obs_space, action_space, num_outputs, model_config, name):
TorchModelV2.__init__(self, obs_space, action_space,
num_outputs, model_config, name)
nn.Module.__init__(self)
# __init__ function logic
def forward(self, input_dict, state, seq_lens):
# forward function logic
def value_function(self):
# value function logic
But the official docs do not tell what does it mean for each of the parameters. Where can I get the detailed explanations?
2. How does the _disable_preprocessor_api act?
What confused me most is that the obs_space parameter in the __init__ method comes differently from what I have defined. Also, the input_dict parameter in forward method contains the observation input_dict['obs'] which has the data structure unclear to me. What's more, there seems to be a discrepancy between the default behaviours described in the official docs and the behaviours appear in my case.
Let's be specific. My observation space is:
There are two _disable_preprocessor_api's in the config, although they are explained same in the official docs, it shows up different behaviours when I set them with different values.
Case 1.model={"_disable_preprocessor_api": False, ...} with .experimental(_disable_preprocessor_api=False), which is the default behaviour of RLLib and the default preprocessors are applied.
The obs_space is Box(-1.0, 1.0, (420,), float32), which is the one-hot encoded and flattened version of my original definition. I've checked the size match.
The input_dict['obs'] preserves the original nested structure of my observation space (i.e. it is a 10-length list), but in each Discrete subspace, it is now the one-hot encoded torch.tensor with additional batch dimension:
Case 2.model={"_disable_preprocessor_api": True, ...} with .experimental(_disable_preprocessor_api=False)
The obs_space is Box(-1.0, 1.0, (420,), float32), same as case 1.
The input_dict['obs'] is now a torch.tensor with shape [32, 420]. I guess that it flattens the observation and prepends a batch dimension.
Case 3.model={"_disable_preprocessor_api": False, ...} with .experimental(_disable_preprocessor_api=True)
The obs_space now preserves the original nested structure, it is the same as the self.observation_space.
The input_dict['obs'] preserves the original nested structure of my observation space, but different from the case 1, it does not one-hot encode the discrete space, but still add a batch dimension:
Case 4.model={"_disable_preprocessor_api": True, ...} with .experimental(_disable_preprocessor_api=True)
The obs_space is the same as the self.observation_space.
The input_dict['obs'] is same as the case 3.
They are too complicated to be understood, and I'm unable to continue programming if I do not figure out what does they exactly mean. It would be greatly appreciated if you could tell how to use this parameters. Thanks a lot in advance.
Description
I started using Ray RLLib (version 2.7.1) to solve a RL problem with customized environments and agents. While the envs are properly set, when I started to customize models, I am quite confused and can not proceed on.
1. What does the parameters mean in each of the methods? As is in the official docs, one may customize the PyTorch model like this:
But the official docs do not tell what does it mean for each of the parameters. Where can I get the detailed explanations?
2. How does the
_disable_preprocessor_api
act? What confused me most is that theobs_space
parameter in the__init__
method comes differently from what I have defined. Also, theinput_dict
parameter inforward
method contains the observationinput_dict['obs']
which has the data structure unclear to me. What's more, there seems to be a discrepancy between the default behaviours described in the official docs and the behaviours appear in my case. Let's be specific. My observation space is:I am writing my config as this:
There are two
_disable_preprocessor_api
's in the config, although they are explained same in the official docs, it shows up different behaviours when I set them with different values. Case 1.model={"_disable_preprocessor_api": False, ...}
with.experimental(_disable_preprocessor_api=False)
, which is the default behaviour of RLLib and the default preprocessors are applied. Theobs_space
isBox(-1.0, 1.0, (420,), float32)
, which is the one-hot encoded and flattened version of my original definition. I've checked the size match. Theinput_dict['obs']
preserves the original nested structure of my observation space (i.e. it is a 10-length list), but in eachDiscrete
subspace, it is now the one-hot encoded torch.tensor with additional batch dimension:Case 2.
model={"_disable_preprocessor_api": True, ...}
with.experimental(_disable_preprocessor_api=False)
Theobs_space
isBox(-1.0, 1.0, (420,), float32)
, same as case 1. Theinput_dict['obs']
is now a torch.tensor with shape [32, 420]. I guess that it flattens the observation and prepends a batch dimension.Case 3.
model={"_disable_preprocessor_api": False, ...}
with.experimental(_disable_preprocessor_api=True)
Theobs_space
now preserves the original nested structure, it is the same as theself.observation_space
. Theinput_dict['obs']
preserves the original nested structure of my observation space, but different from the case 1, it does not one-hot encode the discrete space, but still add a batch dimension:Case 4.
model={"_disable_preprocessor_api": True, ...}
with.experimental(_disable_preprocessor_api=True)
Theobs_space
is the same as theself.observation_space
. Theinput_dict['obs']
is same as the case 3.They are too complicated to be understood, and I'm unable to continue programming if I do not figure out what does they exactly mean. It would be greatly appreciated if you could tell how to use this parameters. Thanks a lot in advance.
Link
Intro to customize models: https://docs.ray.io/en/latest/rllib/rllib-models.html#custom-models-implementing-your-own-forward-logic "_disable_preprocessor_api" key in model config settings: https://docs.ray.io/en/latest/rllib/rllib-models.html#default-model-config-settings
_disable_preprocessor_api
parameter inAlgorithmConfig
settings: https://docs.ray.io/en/latest/rllib/rllib-training.html#specifying-experimental-features