nolanwagener / safe_rl

Implementations of SAILR, PDO, and CSC
MIT License
29 stars 8 forks source link

how to reproduce the baseline CPO? #6

Closed hlhang9527 closed 2 years ago

hlhang9527 commented 2 years ago

Dear author,

I'm trying to reproduce your experiment result but find no CPO implementation in scripts.sh, could you please tell me which line is the implementation of the baseline CPO or do I need to reproduce the CPO myself? Thanks.

nolanwagener commented 2 years ago

I used the safety-starter-agents implementation of CPO https://github.com/openai/safety-starter-agents

For example, for the point environment, within safety-starter-agents/safe_rl/pg I ran

python run_agent.py --env extra_envs:Point-v0 --cost_lim 0.01 --agent cpo --epoch 500 --seed [SEED]
hlhang9527 commented 2 years ago

Thanks!