liuzuxin / cvpo-safe-rl

Code for "Constrained Variational Policy Optimization for Safe Reinforcement Learning" (ICML 2022)
GNU General Public License v3.0
63 stars 7 forks source link

In Environemt implementation #3

Closed minoshiro1 closed 1 year ago

minoshiro1 commented 1 year ago

Hello i wonder if i use an customized environemnt without cost defined info. Is it ok to be used with your CVPO algorithm?

liuzuxin commented 1 year ago

Hi @minoshiro1 , if there is no cost data, how do you plan to define your constraints? CVPO is a constrained optimization algorithm, so as long as you can define the objective (reward return) of a state-action pair and the constraint (cost return) of a state-action pair, then you can use it. The reward return and cost return could be of any format. We use neural network to learn them (Qr and Qc), but you can also manually define those functions for your application.