Closed greg3566 closed 2 years ago
You are right, the goal
environment is actually called button2
in the code. In the paper Sec. 4.1, there are descriptions of this task. The difference between the goal
and the button
task is whether there are gremlins (dynamic obstacles) in the environment. There are several reasons to do this when I ran the experiments:
goal
task in SafetyGym is essentially doing the same thing as the button
task, except 1) the obstacle types and layout are different, and 2) whether the goal locations are fixed. The original goal
task will random sample goal locations and it is very hard to control its randomness in the code, which means that I may need to evaluate each agent for plenty of episodes to characterize the randomness for fair comparison. So instead, I just modify the button
environment, where the goal locations and the obstacle layouts could be easily fixed.goal
task, it takes longer to train the agent. Since CVPO has an inner convex optimization (currently implemented via SciPy, will change it to PyTorch later), the training speed is actually very slow. So in order to save the training time, it is better to fix the randomness of the environment to make it converge faster. The comparison is still valid as long as all the agents are in the same set of tasks.Please let me know if you are still confused! Thanks.
Hi @liuzuxin,
First of all impressive work on the implementation and paper! Congrats :)
Second,
Reduce the training time. Also because of the randomness of the goal locations in the original goal task, it takes longer to train the agent. Since CVPO has an inner convex optimization (currently implemented via SciPy, will change it to PyTorch later), the training speed is actually very slow. So in order to save the training time, it is better to fix the randomness of the environment to make it converge faster. The comparison is still valid as long as all the agents are in the same set of tasks.
If you feel comfortable with it, a JAX implementation could be really easy as it has many numpy/scipy functions built-in with XLA compilation support. If you want, I can help you with that :)
Hi @yardenas,
Thank you so much! Could you point me some examples' link of JAX for accelerating scipy functions so that I can take a look? It would be very helpful :)
Thank you for your detailed response.
Which environment/task was used for the ICML 2022 paper: "Constrained Variational Policy Optimization for Safe Reinforcement Learning" (Liu, et al. 2022)? I am confused because script/goal.py contains a button environment, not a goal, in ENV_LIST.