Closed MaltheT closed 7 months ago
What do you mean by convenient way? Do you mean something on the environment side that allows you to set the clipping range? Or more from the learning agent's perspective?
If it is the latter, then I think you can try adding a sigmoid/tanh to your NN output and do a linear scaling of [-1, 1] to your desired range. Some manipulation works seem to do that often. However, based on experience, we have found that it depends on the robot and task.
In general, clipping inside the environment should be avoided because it hides this information from the learning agent. For instance, a policy outputting a value of -2 or -100 will both get clipped to -1 inside the environment, which the policy isn't aware of.
Thanks that was a good suggestion I didn't think of that!
I tried to add Tanh to the actor_critic.py
in rsl_rl:
# Policy
actor_layers = []
actor_layers.append(nn.Linear(mlp_input_dim_a, actor_hidden_dims[0]))
actor_layers.append(activation)
for layer_index in range(len(actor_hidden_dims)):
if layer_index == len(actor_hidden_dims) - 1:
actor_layers.append(nn.Linear(actor_hidden_dims[layer_index], num_actions))
actor_layers.append(nn.Tanh())
else:
actor_layers.append(nn.Linear(actor_hidden_dims[layer_index], actor_hidden_dims[layer_index + 1]))
actor_layers.append(activation)
self.actor = nn.Sequential(*actor_layers)
# Value function
critic_layers = []
critic_layers.append(nn.Linear(mlp_input_dim_c, critic_hidden_dims[0]))
critic_layers.append(activation)
for layer_index in range(len(critic_hidden_dims)):
if layer_index == len(critic_hidden_dims) - 1:
critic_layers.append(nn.Linear(critic_hidden_dims[layer_index], 1))
critic_layers.append(nn.Tanh())
else:
critic_layers.append(nn.Linear(critic_hidden_dims[layer_index], critic_hidden_dims[layer_index + 1]))
critic_layers.append(activation)
self.critic = nn.Sequential(*critic_layers)
I didn't add any scaling yet. However, the addion seems to make the training unstable (it wasn't unsable before). Is there something I can do in addition, or is this an indication that this approach doesn't work for my robot/task?
This is hard to say. You probably need to tune your PPO hyperparameters as well :/
Another suggestion would be to look into constrained RL. There have been some works in this direction recently:
No problem. Thank you for your help - I will look into the resources you've provided.
Question Constrain Action Space
Hi,
Is it possible to constrain the action space of the agent? E.g. I want my robot to take actions between [-pi/4, pi/4], but the only way I have found to achieve this is by clamping the action values. I was wondering if there is a more convenient way of doing this that also makes it so that the agent doesn't have to explore the action space beyond those values.