jannerm / mbpo

Code for the paper "When to Trust Your Model: Model-Based Policy Optimization"
https://jannerm.github.io/mbpo-www/
MIT License
479 stars 83 forks source link

Observation of the mujoco env #12

Closed Shunichi09 closed 4 years ago

Shunichi09 commented 4 years ago

Hi !! Thank you for sharing your great research and codes.

I have a question about the mujoco environments about your experiments. In the paper, your method were compared with the PETS, which needs the states to calculate the reward in off-line, although your method does not need that states. I checked your codes and your environments, but you did not use the states which are needed to calculate the rewards in off-line. How did you run the PETS which assume that we can observe the states to calculate the rewards in off-line?? (e.g. we will use get_body_com("torso") to calculate the reward in the Antenv. )

Thank you for your help.

https://github.com/JannerM/mbpo/blob/39365f230292a452f0d77d1ad4f2a1795d311052/mbpo/env/ant.py#L31-L36

jannerm commented 4 years ago

Yeah, some of the baselines we compared to require full observability or other additions to the observation. In these cases, we just ran the baseline with the appropriate modifications to the environment to give it a fair shot.

In the particular case of ant, the PETS authors kindly gave us a version of the environment compatible with their code. Here are two gists with the environment and the corresponding config file. (These should be placed here and here in the PETS repo, respectively.)

Shunichi09 commented 4 years ago

I really appreciate your quick response and the codes. I understood your experimental conditions. Thank you for your help!!