Closed minoshiro1 closed 1 year ago
Hi @minoshiro1 , if there is no cost data, how do you plan to define your constraints? CVPO is a constrained optimization algorithm, so as long as you can define the objective (reward return) of a state-action pair and the constraint (cost return) of a state-action pair, then you can use it. The reward return and cost return could be of any format. We use neural network to learn them (Qr and Qc), but you can also manually define those functions for your application.
Hello i wonder if i use an customized environemnt without cost defined info. Is it ok to be used with your CVPO algorithm?