openai / gym

A toolkit for developing and comparing reinforcement learning algorithms.
https://www.gymlibrary.dev
Other
34.82k stars 8.61k forks source link

How to change mujoco model parameters to pytorch tensor? #2810

Closed c4cld closed 2 years ago

c4cld commented 2 years ago

I want to construct a neural network in pytorch which outputs appropriate values of mujoco model parameters. But this requires that mujoco model parameters are converted into pytorch tensors and mujoco environments run well with these converted model parameters. I don't know how to achieve this goal. Anyone know how to do that?

pseudo-rnd-thoughts commented 2 years ago

As most gym environments return numpy arrays then torch.Tensor(obs) should work most of the time

While this tutorial is not for mujoco, it should provide you with most of the details you are interested in https://pytorch.org/tutorials/intermediate/reinforcement_q_learning.html

c4cld commented 2 years ago

@pseudo-rnd-thoughts Thank you for your prompt reply and selfless help ^_^ Most gym environments are written only by python, so converting model parameters into pytorch tensors and computing gradient of reward w.r.t model parameters are easy things. In contrast, it is a different story when it comes to Mujoco, because Mujoco is activated by a simulator written by C++. Computing gradient of reward w.r.t model parameters is a difficult issue. Do you have any advice?

pseudo-rnd-thoughts commented 2 years ago

Im not sure that I understand what you are saying. For deep reinforcement learning algorithms, we compute the gradient using the network parameter w.r.t the training loss. So we do not find the gradient with respect to the environment implementation and it doesn't matter if the environment is written in python or c++. The mujoco environments return a numpy array like cartpole for example so you should be able to use the observation in the same way.

c4cld commented 2 years ago

@pseudo-rnd-thoughts Thank you for your reply. I understand what you mean. The reason why I want to compute the gradient with respect to the environment parameters is that: 1 My research interest is robustness of RL algorithms to environment parameters. I want to modify currents RL algorithms to make them achieve good performance when they are tested in environments with unfamilar parameters. (For example, an agent is trained in Cartpole environment with 1m pole. I want it achieve good performance in Cartpole environment with 3m pole.) 2 To achieve this goal, I want to get the relationship between environment parameter values and RL algorithm's performance (reward). As a result, I want to get the the gradient of reward with respect to the environment parameters.

pseudo-rnd-thoughts commented 2 years ago

This sounds like an interesting idea however I'm not sure how we can help you with solving it. You would need a function from the parameters to rewards that is differentiable however I have no idea how you could do this

c4cld commented 2 years ago

@pseudo-rnd-thoughts Thank you for your joining in the discussion. ^_^