Closed yesiam-png closed 3 years ago
Hi Yesiam,
Thanks! I will look into it. Two questions: is that return the last one or the maximum over the runs? Following Chua et al. I plot the accumulated maximum in the paper. Also, could you check MPC too? MPC usually performs best because there could be some bugs right now in the termination of the simulation.
Thanks, Seb.
Hi Seb, Thanks for your reply! The algorithm code seems to be fine, do you mean bugs in the mujoco simulation rendering code? And I can only run MPC in the MBHopper environment. MPC agents get stuck at the end of the 5th epoch in all other environments (and terminates after a long time). Have you encountered this? The CPU I am using is 32 GB of memory.
Hi Yesiam,
No, I never encounter such problems but my implementation of MPC is definitely very slow, that is why I suggested using the other algorithms (although I should've checked for the Pusher). Maybe you can try out with less num_iter or num_particles in the mpc solver.
I meant bugs in the dimension of the tensors. Sometimes pytorch does some tensor expansions instead of popping an error which yields bad results. As I'm working on other projects, I am sometimes modifying the rllib, and H-UCRL depends on it.
Let me know if you have any other questions.
Thanks Seb! Sorry to bother you again but I still have some questions. 1. In the Mujoco experiments, is MPC the agent type that you report in the paper, e.g., Fig 3 and 4? Could you suggest the settings for reimplementation in the Mujoco?
Hi Yesiam,
Thanks for your clarification! MPC crashes on my PC, and I will try your code on GPU as well as PETS code. I'll close this issue since it has been clarified. Thanks Seb!
Hi Seb, when I run in the MBReacher3d_v0 environment, an error raised:
_File "hucrl/rllib/policy/nn_policy.py", line 148, in forward state = self._preprocess_state(state) File "hucrl/rllib/policy/nn_policy.py", line 142, in _preprocessstate state = torch.cat((state, goal), dim=-1) TypeError: expected Tensor as element 1 in argument 0, but got numpy.ndarray
When I replace the goal by goal = torch.tensor(self.goal, dtype=torch.float)
, another error raised:
_File "hucrl/rllib/util/neural_networks/neural_networks.py", line 153, in forward x = self.hidden_layers(x) File "anaconda3/envs/hucrl/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, *kwargs) File "anaconda3/envs/hucrl/lib/python3.8/site-packages/torch/nn/modules/container.py", line 117, in forward input = module(input) File "anaconda3/envs/hucrl/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(input, **kwargs) File "anaconda3/envs/hucrl/lib/python3.8/site-packages/torch/nn/modules/linear.py", line 93, in forward return F.linear(input, self.weight, self.bias) File "anaconda3/envs/hucrl/lib/python3.8/site-packages/torch/nn/functional.py", line 1692, in linear output = input.matmul(weight.t()) RuntimeError: mat1 and mat2 shapes cannot be multiplied (1x20 and 17x200)_
When print(self.goal)
at https://github.com/sebascuri/rllib/blob/master/rllib/policy/nn_policy.py#L126, it outputs None.
Hi yesiam,
Thanks for catching it! So I see what the problem is: I changed the goal from a parameter to extra state dimensions. I already fixed this bug. As a side effect, the model learning is not working as expected, but I'm working to fix this. I will return to the goal as a parameter asap.
Thanks, Seb.
Thanks Seb! I find that the training return of hucrl agents in inverted pendulum task is pretty good! But after several days of experiments, I still can't get your results for all Mujoco tasks, with your elder version rllib (the one that raises error in Reacher task). For example, with optimistic exploration, bptt performs as follows: Is the above result normal? Also, are you running experiments on CPU only? since I didn't find the cuda option in your rllib code.
Hi Yesiam,
Could you try now? Yes I run on CPU only.
Thanks for your reply Seb! I still can't get the expected results in the HalfCheetah task (training return<1000 for the BPTT, DataAugmentation, MVE agents in 300 episodes). The two repositories (hucrl and your rllib) I use are up to date. I'll really appreciate it if you could check it!
Do you still get the same train returns as before? Are you using the default agents?
Yes the training returns are the same as the picture above. The command I use, e.g. for dataaugmentation agent is: python exps/mujoco/run.py --agent DataAugmentation --env-config-file exps/mujoco/config/envs/half-cheetah.yaml --agent-config-file exps/mujoco/config/agents/data_augmentation.yaml --train-episodes 400
. Below is what I get:
Hi Yesiam, This is way better than before! Also note that we plot the maximum cumulative returns in the paper following Chua et al. 2018, not the current train return. I think 2500 is already ok for performance in Half Cheetah. However, if you want even more performance I could only make it with MPC.
Thanks for your reply Seb! Without hucrl, the DataAugmentation agent can achieve a return of 5000. Would you suggest some possible modifications for the hucrl_DataAugmentation agent to make them comparable?
For the MPC agent, do we need num_samples=500
that large? Currently, I can't get the default MPC set to run on my PC. By changing it from 500 to 50, the training return is 1300. (it's still much slower than DataAug or BPTT even with num_samples=50)
Hi Yesiam,
That is for the default action cost right? Essentially, when \beta=0, then the HUCRL agent should perform as the expected agent, so you can tune \beta if you want to.
I think MPC performs better because for H-UCRL to work, one needs to solve an optimization problem and MPC does so approximately, whereas DataAugmentation and BPTT do so partially (they don't fully optimize the policy by doing some gradient steps). It is possible that by expanding the action space from n_action to n_action + n_states then it needs more iterations for it to converge and this could explain this issue. MPC does not have this problem.
For the num_samples issue, more samples = better optimization as it is a shooting method. You can try with 400/500 samples but maybe a shorter horizon to see if this works better for you.
Thanks for your explanation! I'll try your suggestions!
Hi @yesiam-png, I realized that sometimes the default model was too big and, in some computers, it has a massive slowdown the execution of the hidden-layers. If you reduce the depth from (200, 200, 200, 200, 200) to (200, 200, 200) performance is not hindered by much and the MPC runs faster.
Hi Sebastian, first thanks for your excellent code and paper! However, the BPTT and Data_Augmentation agents fail to accomplish the Pusher task in the simulation and output a very low return, e.g., -416.11. I have only tried these two agents in the Pusher environment, so I am not sure if I run it correctly. E.g., for BPTT agents, I run
python exps/mujoco/run.py --environment MBPusher-v0 --agent BPTT --config-file exps/mujoco/config/bptt.yaml