Closed c4cld closed 2 years ago
As most gym environments return numpy arrays then torch.Tensor(obs)
should work most of the time
While this tutorial is not for mujoco, it should provide you with most of the details you are interested in https://pytorch.org/tutorials/intermediate/reinforcement_q_learning.html
@pseudo-rnd-thoughts Thank you for your prompt reply and selfless help ^_^ Most gym environments are written only by python, so converting model parameters into pytorch tensors and computing gradient of reward w.r.t model parameters are easy things. In contrast, it is a different story when it comes to Mujoco, because Mujoco is activated by a simulator written by C++. Computing gradient of reward w.r.t model parameters is a difficult issue. Do you have any advice?
Im not sure that I understand what you are saying. For deep reinforcement learning algorithms, we compute the gradient using the network parameter w.r.t the training loss. So we do not find the gradient with respect to the environment implementation and it doesn't matter if the environment is written in python or c++. The mujoco environments return a numpy array like cartpole for example so you should be able to use the observation in the same way.
@pseudo-rnd-thoughts Thank you for your reply. I understand what you mean. The reason why I want to compute the gradient with respect to the environment parameters is that: 1 My research interest is robustness of RL algorithms to environment parameters. I want to modify currents RL algorithms to make them achieve good performance when they are tested in environments with unfamilar parameters. (For example, an agent is trained in Cartpole environment with 1m pole. I want it achieve good performance in Cartpole environment with 3m pole.) 2 To achieve this goal, I want to get the relationship between environment parameter values and RL algorithm's performance (reward). As a result, I want to get the the gradient of reward with respect to the environment parameters.
This sounds like an interesting idea however I'm not sure how we can help you with solving it. You would need a function from the parameters to rewards that is differentiable however I have no idea how you could do this
@pseudo-rnd-thoughts Thank you for your joining in the discussion. ^_^
I want to construct a neural network in pytorch which outputs appropriate values of mujoco model parameters. But this requires that mujoco model parameters are converted into pytorch tensors and mujoco environments run well with these converted model parameters. I don't know how to achieve this goal. Anyone know how to do that?