Closed hlhang9527 closed 2 years ago
I used the safety-starter-agents implementation of CPO https://github.com/openai/safety-starter-agents
For example, for the point environment, within safety-starter-agents/safe_rl/pg
I ran
python run_agent.py --env extra_envs:Point-v0 --cost_lim 0.01 --agent cpo --epoch 500 --seed [SEED]
Thanks!
Dear author,
I'm trying to reproduce your experiment result but find no CPO implementation in scripts.sh, could you please tell me which line is the implementation of the baseline CPO or do I need to reproduce the CPO myself? Thanks.