Open akjayant opened 3 years ago
Hi,
Thank you for your interest in our work! This research was done concurrently with the release of the safety gym environment and we did not have a chance to test the algorithm on those environments at the time. The hyperparameters we used were tuned using the benchmarks we reported in the paper so it is possible that the current setting may not work well for other environments. We thank you for raising this issue and we will look into applying the algorithm to the safety gym environments.
I tried this on Safety gym with just changing env_name, using info['cost'] for constraint calculations, and computing cost returns as undiscounted sum of costs instead of discounted sum of cost - `ret_eps += rew
cost_ret_eps += cost` (SInce Safety Gym aim is to max sum of rewards and also constrain undiscounted sum of costs where max episode length is 1000 steps)
Performance is quite poor on even Point Goal level 2 environment with costs not getting constrained at all. What can be reason for that? https://openai.com/blog/safety-gym/