Closed Henry668 closed 1 year ago
I see FOCOPS uses the discounted cost return in MuJoCo tasks, but in Safety-Gym tasks people always use the undiscounted cost return. So should I set c_gamma
to 1 when running SG tasks?
I also do safeRL, I think discount cost return will unreasonable, i.e. in a trajectory, cost in different time steps will be the same metrics for evaluate policy safe or unsafe. More example can be referred to https://github.com/OmniSafeAI/omnisafe, which is a repo concluding many algorithms for safeRL.
Hi @Gaiejj ! Thanks for the recommendation. I see in the recommended repo the cost return is undiscounted, but I just wonder the cost metric (dis- or undiscounted) CUP used in the four SG tasks in the paper, which is not so clear neither from the paper nor from the code. Note that the publicly available results in OmniSafeAI are currently only at MuJoCo(https://github.com/OmniSafeAI/omnisafe/tree/main/omnisafe/algorithms/on_policy/benchmarks), could you provide some SG results?
Sorry for so late reply, OmniSafe is currently developing benchmarks of navigation tasks (e.g. SafetyPointGoal1-v0). We will public the benchmark as soon as it s ready. 😊
Do you use the discounted cost return or undiscounted cost return for evaluation?