zmsn-2077 / CUP-safe-rl

NeurIPS2022: Constrained Update Projection Approach to Safe Policy Optimization
11 stars 2 forks source link

How does the cost return calculated? #1

Closed Henry668 closed 1 year ago

Henry668 commented 1 year ago

Do you use the discounted cost return or undiscounted cost return for evaluation?

Henry668 commented 1 year ago

I see FOCOPS uses the discounted cost return in MuJoCo tasks, but in Safety-Gym tasks people always use the undiscounted cost return. So should I set c_gamma to 1 when running SG tasks?

Gaiejj commented 1 year ago

I also do safeRL, I think discount cost return will unreasonable, i.e. in a trajectory, cost in different time steps will be the same metrics for evaluate policy safe or unsafe. More example can be referred to https://github.com/OmniSafeAI/omnisafe, which is a repo concluding many algorithms for safeRL.

Henry668 commented 1 year ago

Hi @Gaiejj ! Thanks for the recommendation. I see in the recommended repo the cost return is undiscounted, but I just wonder the cost metric (dis- or undiscounted) CUP used in the four SG tasks in the paper, which is not so clear neither from the paper nor from the code. Note that the publicly available results in OmniSafeAI are currently only at MuJoCo(https://github.com/OmniSafeAI/omnisafe/tree/main/omnisafe/algorithms/on_policy/benchmarks), could you provide some SG results?

Gaiejj commented 1 year ago

Sorry for so late reply, OmniSafe is currently developing benchmarks of navigation tasks (e.g. SafetyPointGoal1-v0). We will public the benchmark as soon as it s ready. 😊