Poor Performance on Open AI Safety Gym

ymzhang01 / focops

Pytorch Implementation for First Order Constrained Optimization in Policy Space (FOCOPS).

25 stars 5 forks source link

I tried this on Safety gym with just changing env_name, using info['cost'] for constraint calculations, and computing cost returns as undiscounted sum of costs instead of discounted sum of cost - `ret_eps += rew

cost_ret_eps += cost` (SInce Safety Gym aim is to max sum of rewards and also constrain undiscounted sum of costs where max episode length is 1000 steps)

Performance is quite poor on even Point Goal level 2 environment with costs not getting constrained at all. What can be reason for that? https://openai.com/blog/safety-gym/

ymzhang01 / focops

Poor Performance on Open AI Safety Gym #1