liuzuxin / cvpo-safe-rl

Code for "Constrained Variational Policy Optimization for Safe Reinforcement Learning" (ICML 2022)
GNU General Public License v3.0
63 stars 7 forks source link

Which environment/task was used for the ICML 2022 paper? #1

Closed greg3566 closed 2 years ago

greg3566 commented 2 years ago

Which environment/task was used for the ICML 2022 paper: "Constrained Variational Policy Optimization for Safe Reinforcement Learning" (Liu, et al. 2022)? I am confused because script/goal.py contains a button environment, not a goal, in ENV_LIST.

liuzuxin commented 2 years ago

You are right, the goal environment is actually called button2 in the code. In the paper Sec. 4.1, there are descriptions of this task. The difference between the goal and the button task is whether there are gremlins (dynamic obstacles) in the environment. There are several reasons to do this when I ran the experiments:

Please let me know if you are still confused! Thanks.

yardenas commented 2 years ago

Hi @liuzuxin,

First of all impressive work on the implementation and paper! Congrats :)

Second,

Reduce the training time. Also because of the randomness of the goal locations in the original goal task, it takes longer to train the agent. Since CVPO has an inner convex optimization (currently implemented via SciPy, will change it to PyTorch later), the training speed is actually very slow. So in order to save the training time, it is better to fix the randomness of the environment to make it converge faster. The comparison is still valid as long as all the agents are in the same set of tasks.

If you feel comfortable with it, a JAX implementation could be really easy as it has many numpy/scipy functions built-in with XLA compilation support. If you want, I can help you with that :)

liuzuxin commented 2 years ago

Hi @yardenas,

Thank you so much! Could you point me some examples' link of JAX for accelerating scipy functions so that I can take a look? It would be very helpful :)

greg3566 commented 2 years ago

Thank you for your detailed response.

yardenas commented 2 years ago

Hi @yardenas,

Thank you so much! Could you point me some examples' link of JAX for accelerating scipy functions so that I can take a look? It would be very helpful :)

@liuzuxin, I have a CPO implementation in JAX here. It's still not tested comprehensively though